Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169331417
**[Test build #48859 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48859/consoleFull)**
for PR 10604 at commit
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169330013
you can review the last commit to ignore bucket write part.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169353471
**[Test build #48859 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48859/consoleFull)**
for PR 10604 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169354193
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169354186
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169045155
A question about data distribution: When we read in a bucketed table, we
will generate one RDD partition for each bucket. However, how can we ensure the
distribution
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169040365
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169040364
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169040297
**[Test build #48773 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48773/consoleFull)**
for PR 10604 at commit
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/10604
[SPARK-12649][SQL][WIP] support reading bucketed table
TODO:
* better integration with data source API.
* correctly populate outputPartitioning/outputOrdering
* bucket pruning
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169035044
**[Test build #48773 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48773/consoleFull)**
for PR 10604 at commit
Github user nongli commented on the pull request:
https://github.com/apache/spark/pull/10604#issuecomment-169133603
@cloud-fan I'm not sure I understand your question. We are guaranteed that
for a bucketed data set, each file in HDFS is for the same bucket. We need to
coalesce
12 matches
Mail list logo