Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-68296918
The size of the data is 100GB in its uncompressed binary representation.
You are probably compressing the data when you saved it as sequence file. When
you save it as text
Github user liuqiyun commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-68331374
So how to save as the uncompressed binary representation in the
GenSort.scala program? I want to compare it with Hadoop MR which also use the
uncompressed binary
Github user liuqiyun commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-68199942
@rxin I am confusing on the input parameters of GenSort.scala.
It requires 3 parameters: [num-parts] [records-per-part] [output-path].
If I want to generate and
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-55352215
Hi @rxin , sorry to bring this out. Are you planning to merge this terasort
example into Spark? I think this would be a good standard to test the
performance of
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-55364233
I don't think we are going to merge this in Spark, unless there is huge
demand from users...
---
If your project is set up for it, you can reply to this email and have
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-54100933
@rxin can you close this for now? It's been lingering a long time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user rxin closed the pull request at:
https://github.com/apache/spark/pull/1242
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47550102
The hadoop code for generating the data is out of date. It might not matter
for your purposes, but if you want the up to date one look at
sortbenchmark.org. I had
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47420731
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47420727
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47421446
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16227/
---
If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47421445
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47440103
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47440099
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47441036
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47441037
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16231/
---
If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47310836
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47310838
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16196/
---
If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47415018
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47415020
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47415821
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16224/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47415820
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/1242
[SPARK-2304] tera sort example program for shuffle benchmarks
This pull request adds an example program for benchmarking Spark shuffle.
It dynamically generates a set of 100 byte records according to
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47308594
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47308593
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47308636
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16195/
---
If your project is set up for it, you can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47308635
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47308992
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1242#issuecomment-47308985
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
29 matches
Mail list logo