[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169331417 **[Test build #48859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48859/consoleFull)** for PR 10604 at commit

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-06 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169330013 you can review the last commit to ignore bucket write part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169353471 **[Test build #48859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48859/consoleFull)** for PR 10604 at commit

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169354193 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169354186 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169045155 A question about data distribution: When we read in a bucketed table, we will generate one RDD partition for each bucket. However, how can we ensure the distribution

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169040365 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169040364 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169040297 **[Test build #48773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48773/consoleFull)** for PR 10604 at commit

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/10604 [SPARK-12649][SQL][WIP] support reading bucketed table TODO: * better integration with data source API. * correctly populate outputPartitioning/outputOrdering * bucket pruning

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169035044 **[Test build #48773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48773/consoleFull)** for PR 10604 at commit

[GitHub] spark pull request: [SPARK-12649][SQL][WIP] support reading bucket...

2016-01-05 Thread nongli
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10604#issuecomment-169133603 @cloud-fan I'm not sure I understand your question. We are guaranteed that for a bucketed data set, each file in HDFS is for the same bucket. We need to coalesce