[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190527518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52210/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190527517 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190527389 **[Test build #52210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52210/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190497524 **[Test build #52210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52210/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190497155 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190033570 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190033571 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52156/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190033507 **[Test build #52156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52156/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190017147 **[Test build #52156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52156/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190016890 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190005709 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190005714 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52152/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-190005200 **[Test build #52152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52152/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-189986044 **[Test build #52152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52152/consoleFull)** for PR 9483 at commit [`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-189470296 LGTM except some minor suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r54296531 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -89,4 +89,25 @@ class HiveTableScanSuite extends HiveComparisonTest { assert(sql("select CaseSensitiveColName from spark_4959_2").head() === Row("hi")) assert(sql("select casesensitivecolname from spark_4959_2").head() === Row("hi")) } + --- End diff -- remove the extra empty line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r54296448 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import scala.reflect.ClassTag + +import org.apache.spark.{Partition, SparkContext} +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils + +object ParallelUnionRDD { --- End diff -- `private[hive]` or move it into the upper level package? The same for the class `ParallelUnionRDD`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r54296499 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import scala.reflect.ClassTag + +import org.apache.spark.{Partition, SparkContext} +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils + +object ParallelUnionRDD { + lazy val executorService = ThreadUtils.newDaemonFixedThreadPool(16, "ParallelUnionRDD") +} + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + + override def getPartitions: Array[Partition] = { +// Calc partitions field for each RDD in parallel. +val rddPartitions = rdds.map {rdd => + (rdd, ParallelUnionRDD.executorService.submit(new Callable[Array[Partition]] { +override def call(): Array[Partition] = rdd.partitions + })) +}.map {case(r, f) => (r, f.get())} --- End diff -- space before `}` and after `{` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188589240 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188589241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51920/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188589088 **[Test build #51920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51920/consoleFull)** for PR 9483 at commit [`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188564809 **[Test build #51920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51920/consoleFull)** for PR 9483 at commit [`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188561642 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188155757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51861/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188155753 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188155365 **[Test build #51861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51861/consoleFull)** for PR 9483 at commit [`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188130564 **[Test build #51861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51861/consoleFull)** for PR 9483 at commit [`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188126065 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51856/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188126062 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188126052 **[Test build #51856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51856/consoleFull)** for PR 9483 at commit [`6456f12`](https://github.com/apache/spark/commit/6456f12c3d4554a03d18f9d8d26ad315e33753d8). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-188125447 **[Test build #51856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51856/consoleFull)** for PR 9483 at commit [`6456f12`](https://github.com/apache/spark/commit/6456f12c3d4554a03d18f9d8d26ad315e33753d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53125003 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils +import org.apache.spark.{Partition, SparkContext} + +import scala.reflect.ClassTag + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + // TODO: We might need to guess a more reasonable thread pool size here + @transient val executorService = ThreadUtils.newDaemonFixedThreadPool( +Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), "ParallelUnionRDD") --- End diff -- I don't think we have to put the fixed number of `Runtime.getRuntime.availableProcessors()`, probably we can simply put a fixed number says `16` or even bigger, as the bottleneck is in network / IO, not the CPU scheduling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53124605 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils +import org.apache.spark.{Partition, SparkContext} + +import scala.reflect.ClassTag + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + // TODO: We might need to guess a more reasonable thread pool size here + @transient val executorService = ThreadUtils.newDaemonFixedThreadPool( +Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), "ParallelUnionRDD") + + override def getPartitions: Array[Partition] = { +// Calc partitions field for each RDD in parallel. +val rddPartitions = rdds.map {rdd => + (rdd, executorService.submit(new Callable[Array[Partition]] { +override def call(): Array[Partition] = rdd.partitions + })) +}.map {case(r, f) => (r, f.get())} + +val array = new Array[Partition](rddPartitions.map(_._2.length).sum) --- End diff -- seems here still be the main thread, probably we even don't need to place the `synchronized` in the `getPartitions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53124525 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag]( // Our dependencies and partitions will be gotten by calling subclass's methods below, and will // be overwritten when we're checkpointed private var dependencies_ : Seq[Dependency[_]] = null - @transient private var partitions_ : Array[Partition] = null + @transient @volatile private var partitions_ : Array[Partition] = null --- End diff -- to be more precisely, https://github.com/apache/spark/pull/9483/files#diff-f4d927f57038fd77e8df7e976a0f29b3R35 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53124507 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag]( // Our dependencies and partitions will be gotten by calling subclass's methods below, and will // be overwritten when we're checkpointed private var dependencies_ : Seq[Dependency[_]] = null - @transient private var partitions_ : Array[Partition] = null + @transient @volatile private var partitions_ : Array[Partition] = null --- End diff -- per my understanding, I don't think we need the `@volatile` here, probably the only place we need to change is the add the modifier of `synchronized` for method `getPartitions` in the concrete sub class of RDD, which will force the cpu cache to memory as the barrier fence of jvm memory model. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-172447842 @yhuai @rxin , any thoughts or concerns for this PR? It's common that one table contains tons of partitions(i.e every 15mins a partition for clicking data). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r45285408 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag]( // Our dependencies and partitions will be gotten by calling subclass's methods below, and will // be overwritten when we're checkpointed private var dependencies_ : Seq[Dependency[_]] = null - @transient private var partitions_ : Array[Partition] = null + @transient @volatile private var partitions_ : Array[Partition] = null --- End diff -- `partitions_ ` would be write/read by multiple threads, just put it here for visibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r45285157 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils +import org.apache.spark.{Partition, SparkContext} + +import scala.reflect.ClassTag + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + // TODO: We might need to guess a more reasonable thread pool size here + @transient val executorService = ThreadUtils.newDaemonFixedThreadPool( +Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), "ParallelUnionRDD") --- End diff -- I don't have strong opinion on this. How about creating a shared thread pool with the same size as cpu cores ? ``` scala object ParallelUnionRDD{ val executorService = ThreadUtils.newDaemonFixedThreadPool(Runtime.getRuntime.availableProcessors(), "ParallelUnionRDD") } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r45269601 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag]( // Our dependencies and partitions will be gotten by calling subclass's methods below, and will // be overwritten when we're checkpointed private var dependencies_ : Seq[Dependency[_]] = null - @transient private var partitions_ : Array[Partition] = null + @transient @volatile private var partitions_ : Array[Partition] = null --- End diff -- Do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r45269521 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils +import org.apache.spark.{Partition, SparkContext} + +import scala.reflect.ClassTag + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + // TODO: We might need to guess a more reasonable thread pool size here + @transient val executorService = ThreadUtils.newDaemonFixedThreadPool( +Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), "ParallelUnionRDD") --- End diff -- Should we share the single thread pool instead of creating a thread pool for every `ParallelUnionRDD`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhonghaihua commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-156810307 Hi @zhichao-li ,thanks for doing this.I got a problem of scanning partitions slowly,and I apply this patch to my spark version.In my case: * Before I apply this patch,it takes at least 3 or 4 minutes to scan partitions. * After applying this patch,it takes only about 20 seconds at this stage. I am happy to see it takes effect in my case.It solve my problem.And I think is it better to add conf to control whether to use this featureï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhonghaihua commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-156810714 Hi @zhichao-li ,thanks for doing this.I got a problem of scanning partitions slowly,and I apply this patch to my spark version.In my case: * Before I apply this patch,it takes at least 3 or 4 minutes to scan partitions. * After applying this patch,it takes only about 20 seconds at this stage. I am happy to see it takes effect in my case.It solve my problem.And I think is it better to add conf to control whether to use this featureï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153930955 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153930934 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
GitHub user zhichao-li opened a pull request: https://github.com/apache/spark/pull/9483 [SPARK-11517][SQL]Calc partitions in parallel for multiple partitions table Currently we calculate the getPartitions for each "hive partition" in sequence way, it would be faster if we can parallel this on driver side You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhichao-li/spark parallelUnionRDD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9483.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9483 commit 63dc9c04cc5d5fc9b815685dab1ba6d5811a999c Author: zhichao.liDate: 2015-11-04T08:28:08Z parallel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user zhichao-li commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153930848 cc @chenghao-intel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153931045 **[Test build #45083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45083/consoleFull)** for PR 9483 at commit [`63dc9c0`](https://github.com/apache/spark/commit/63dc9c04cc5d5fc9b815685dab1ba6d5811a999c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153956255 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45083/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153956253 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153956188 **[Test build #45083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45083/consoleFull)** for PR 9483 at commit [`63dc9c0`](https://github.com/apache/spark/commit/63dc9c04cc5d5fc9b815685dab1ba6d5811a999c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class ParallelUnionRDD[T: ClassTag](`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/9483#issuecomment-153974427 cc/ @scwf @Sephiroth-Lin, not sure if you guys get time for benchmarking this with the real world cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org