Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user deswal-ajit commented on the issue:
https://github.com/apache/spark/pull/15297
hi
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user YuhuWang2002 commented on the issue:
https://github.com/apache/spark/pull/15297
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69990/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #69990 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69990/consoleFull)**
for PR 15297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #69990 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69990/consoleFull)**
for PR 15297 at commit
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/15297
This is a really big change - and handling skewed data in joins is
certainly an important consideration - have you considered making a design
document and running it by the dev list? Maybe
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68448/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #68448 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68448/consoleFull)**
for PR 15297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #68448 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68448/consoleFull)**
for PR 15297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68446/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #68446 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68446/consoleFull)**
for PR 15297 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #68446 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68446/consoleFull)**
for PR 15297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #68442 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68442/consoleFull)**
for PR 15297 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15297
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68442/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15297
**[Test build #68442 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68442/consoleFull)**
for PR 15297 at commit
Github user scwf commented on the issue:
https://github.com/apache/spark/pull/15297
@YuhuWang2002
We should limit the use case for outer join:
For left outer join, such as A left join B, this implementation now can not
handle the case of skew of table B. That's because
Github user YuhuWang2002 commented on the issue:
https://github.com/apache/spark/pull/15297
I do some performance test between use skew join algorithm and not use skew
join algorithm.
I generate 2 table with 1/5 data skew in table S and 1/1 data skew in
table R. Two table
Github user YuhuWang2002 commented on the issue:
https://github.com/apache/spark/pull/15297
skewed join implementation suit for dataframe and sql statement
you will get 210 output files.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/15297
Ok so are you saying this skewed join implementation doesn't apply to other
dataframe operations, something like:
val df_pixels = sqlContext.read.parquet("somefile")
val
Github user YuhuWang2002 commented on the issue:
https://github.com/apache/spark/pull/15297
@tgravescs : In join case,some like : select count(*) from A join B. if
the parameter spark.sql.shuffle.partitions=200 ,then we get 200 tasks output
about 'count num', the output is not in
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/15297
Ok so how does that affect the overall job and # of outputs? I don't know
the internals of Spark SQL so sorry if I'm missing something obvious.
Basically now you will have multiple tasks
Github user YuhuWang2002 commented on the issue:
https://github.com/apache/spark/pull/15297
@tgravescs ï¼
Thank you for your response, when a single reduce task handling huge data,
it's slowly and unstable. so we split one reduce task to multi- reduce task.
A single reduce
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/15297
I haven't looked through the code in detail but can you clarify the design
a bit on this, the design pretty much just says we are splitting up the fetch
of the map outputs but it doesn't say what
30 matches
Mail list logo