[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190527518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52210/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190527517
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190527389
  
**[Test build #52210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52210/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190497524
  
**[Test build #52210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52210/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-29 Thread zhichao-li
Github user zhichao-li commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190497155
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190033570
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190033571
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52156/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190033507
  
**[Test build #52156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52156/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190017147
  
**[Test build #52156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52156/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread zhichao-li
Github user zhichao-li commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190016890
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190005709
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190005714
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52152/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-190005200
  
**[Test build #52152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52152/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-189986044
  
**[Test build #52152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52152/consoleFull)**
 for PR 9483 at commit 
[`fdac95b`](https://github.com/apache/spark/commit/fdac95bb06546b5d92b8c5dda5ee633f2221d347).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-189470296
  
LGTM except some minor suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r54296531
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
 ---
@@ -89,4 +89,25 @@ class HiveTableScanSuite extends HiveComparisonTest {
 assert(sql("select CaseSensitiveColName from spark_4959_2").head() === 
Row("hi"))
 assert(sql("select casesensitivecolname from spark_4959_2").head() === 
Row("hi"))
   }
+
--- End diff --

remove the extra empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r54296448
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.concurrent.Callable
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Partition, SparkContext}
+import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD}
+import org.apache.spark.util.ThreadUtils
+
+object ParallelUnionRDD {
--- End diff --

`private[hive]` or move it into the upper level package? The same for the 
class `ParallelUnionRDD`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r54296499
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.concurrent.Callable
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Partition, SparkContext}
+import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD}
+import org.apache.spark.util.ThreadUtils
+
+object ParallelUnionRDD {
+  lazy val executorService = ThreadUtils.newDaemonFixedThreadPool(16, 
"ParallelUnionRDD")
+}
+
+class ParallelUnionRDD[T: ClassTag](
+  sc: SparkContext,
+  rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){
+
+  override def getPartitions: Array[Partition] = {
+// Calc partitions field for each RDD in parallel.
+val rddPartitions = rdds.map {rdd =>
+  (rdd, ParallelUnionRDD.executorService.submit(new 
Callable[Array[Partition]] {
+override def call(): Array[Partition] = rdd.partitions
+  }))
+}.map {case(r, f) => (r, f.get())}
--- End diff --

space before `}` and after `{`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188589240
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188589241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51920/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188589088
  
**[Test build #51920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51920/consoleFull)**
 for PR 9483 at commit 
[`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188564809
  
**[Test build #51920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51920/consoleFull)**
 for PR 9483 at commit 
[`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread zhichao-li
Github user zhichao-li commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188561642
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188155757
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51861/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188155753
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188155365
  
**[Test build #51861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51861/consoleFull)**
 for PR 9483 at commit 
[`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188130564
  
**[Test build #51861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51861/consoleFull)**
 for PR 9483 at commit 
[`db84ab9`](https://github.com/apache/spark/commit/db84ab94d26e945fc44ef2adb789eb85ad229a3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188126065
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51856/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188126062
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188126052
  
**[Test build #51856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51856/consoleFull)**
 for PR 9483 at commit 
[`6456f12`](https://github.com/apache/spark/commit/6456f12c3d4554a03d18f9d8d26ad315e33753d8).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-188125447
  
**[Test build #51856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51856/consoleFull)**
 for PR 9483 at commit 
[`6456f12`](https://github.com/apache/spark/commit/6456f12c3d4554a03d18f9d8d26ad315e33753d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-16 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r53125003
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.concurrent.Callable
+
+import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD}
+import org.apache.spark.util.ThreadUtils
+import org.apache.spark.{Partition, SparkContext}
+
+import scala.reflect.ClassTag
+
+class ParallelUnionRDD[T: ClassTag](
+  sc: SparkContext,
+  rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){
+  // TODO: We might need to guess a more reasonable thread pool size here
+  @transient val executorService = ThreadUtils.newDaemonFixedThreadPool(
+Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), 
"ParallelUnionRDD")
--- End diff --

I don't think we have to put the fixed number of 
`Runtime.getRuntime.availableProcessors()`, probably we can simply put a fixed 
number says `16` or even bigger, as the bottleneck is in network / IO, not the 
CPU scheduling.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-16 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r53124605
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.concurrent.Callable
+
+import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD}
+import org.apache.spark.util.ThreadUtils
+import org.apache.spark.{Partition, SparkContext}
+
+import scala.reflect.ClassTag
+
+class ParallelUnionRDD[T: ClassTag](
+  sc: SparkContext,
+  rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){
+  // TODO: We might need to guess a more reasonable thread pool size here
+  @transient val executorService = ThreadUtils.newDaemonFixedThreadPool(
+Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), 
"ParallelUnionRDD")
+
+  override def getPartitions: Array[Partition] = {
+// Calc partitions field for each RDD in parallel.
+val rddPartitions = rdds.map {rdd =>
+  (rdd, executorService.submit(new Callable[Array[Partition]] {
+override def call(): Array[Partition] = rdd.partitions
+  }))
+}.map {case(r, f) => (r, f.get())}
+
+val array = new Array[Partition](rddPartitions.map(_._2.length).sum)
--- End diff --

seems here still be the main thread, probably we even don't need to place 
the `synchronized` in the `getPartitions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-16 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r53124525
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag](
   // Our dependencies and partitions will be gotten by calling subclass's 
methods below, and will
   // be overwritten when we're checkpointed
   private var dependencies_ : Seq[Dependency[_]] = null
-  @transient private var partitions_ : Array[Partition] = null
+  @transient @volatile private var partitions_ : Array[Partition] = null
--- End diff --

to be more precisely, 
https://github.com/apache/spark/pull/9483/files#diff-f4d927f57038fd77e8df7e976a0f29b3R35


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-02-16 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r53124507
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag](
   // Our dependencies and partitions will be gotten by calling subclass's 
methods below, and will
   // be overwritten when we're checkpointed
   private var dependencies_ : Seq[Dependency[_]] = null
-  @transient private var partitions_ : Array[Partition] = null
+  @transient @volatile private var partitions_ : Array[Partition] = null
--- End diff --

per my understanding, I don't think we need the `@volatile` here, probably 
the only place we need to change is the add the modifier of `synchronized` for 
method `getPartitions` in the concrete sub class of RDD, which will force the 
cpu cache to memory as the barrier fence of jvm memory model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2016-01-17 Thread zhichao-li
Github user zhichao-li commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-172447842
  
@yhuai @rxin , any thoughts or concerns for this PR? It's common that one 
table contains tons of partitions(i.e every 15mins a partition for clicking 
data).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-18 Thread zhichao-li
Github user zhichao-li commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r45285408
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag](
   // Our dependencies and partitions will be gotten by calling subclass's 
methods below, and will
   // be overwritten when we're checkpointed
   private var dependencies_ : Seq[Dependency[_]] = null
-  @transient private var partitions_ : Array[Partition] = null
+  @transient @volatile private var partitions_ : Array[Partition] = null
--- End diff --

`partitions_ ` would be write/read by multiple threads, just put it here 
for visibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-18 Thread zhichao-li
Github user zhichao-li commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r45285157
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.concurrent.Callable
+
+import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD}
+import org.apache.spark.util.ThreadUtils
+import org.apache.spark.{Partition, SparkContext}
+
+import scala.reflect.ClassTag
+
+class ParallelUnionRDD[T: ClassTag](
+  sc: SparkContext,
+  rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){
+  // TODO: We might need to guess a more reasonable thread pool size here
+  @transient val executorService = ThreadUtils.newDaemonFixedThreadPool(
+Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), 
"ParallelUnionRDD")
--- End diff --

I don't have strong opinion on this. How about creating a shared thread 
pool with the same size as cpu cores ?
``` scala
object ParallelUnionRDD{
val executorService = 
ThreadUtils.newDaemonFixedThreadPool(Runtime.getRuntime.availableProcessors(), 
"ParallelUnionRDD")
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-18 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r45269601
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag](
   // Our dependencies and partitions will be gotten by calling subclass's 
methods below, and will
   // be overwritten when we're checkpointed
   private var dependencies_ : Seq[Dependency[_]] = null
-  @transient private var partitions_ : Array[Partition] = null
+  @transient @volatile private var partitions_ : Array[Partition] = null
--- End diff --

Do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-18 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9483#discussion_r45269521
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.concurrent.Callable
+
+import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD}
+import org.apache.spark.util.ThreadUtils
+import org.apache.spark.{Partition, SparkContext}
+
+import scala.reflect.ClassTag
+
+class ParallelUnionRDD[T: ClassTag](
+  sc: SparkContext,
+  rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){
+  // TODO: We might need to guess a more reasonable thread pool size here
+  @transient val executorService = ThreadUtils.newDaemonFixedThreadPool(
+Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), 
"ParallelUnionRDD")
--- End diff --

Should we share the single thread pool instead of creating a thread pool 
for every `ParallelUnionRDD`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-15 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-156810307
  
Hi @zhichao-li ,thanks for doing this.I got a problem of scanning 
partitions slowly,and I apply this patch to my spark version.In my case:
 * Before I apply this patch,it takes at least 3 or 4 minutes to scan 
partitions.
 * After applying this patch,it takes only about 20 seconds at this stage.
I am happy to see it takes effect in my case.It solve my problem.And I 
think is it better to add conf to control whether to use this feature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-15 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-156810714
  
Hi @zhichao-li ,thanks for doing this.I got a problem of scanning 
partitions slowly,and I apply this patch to my spark version.In my case:
* Before I apply this patch,it takes at least 3 or 4 minutes to scan 
partitions.
* After applying this patch,it takes only about 20 seconds at this stage.

I am happy to see it takes effect in my case.It solve my problem.And I 
think is it better to add conf to control whether to use this feature?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153930955
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153930934
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread zhichao-li
GitHub user zhichao-li opened a pull request:

https://github.com/apache/spark/pull/9483

[SPARK-11517][SQL]Calc partitions in parallel for multiple partitions table

Currently we calculate the getPartitions for each "hive partition" in 
sequence way, it would be faster if we can parallel this on driver side

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhichao-li/spark parallelUnionRDD

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9483


commit 63dc9c04cc5d5fc9b815685dab1ba6d5811a999c
Author: zhichao.li 
Date:   2015-11-04T08:28:08Z

parallel




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread zhichao-li
Github user zhichao-li commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153930848
  
cc @chenghao-intel 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153931045
  
**[Test build #45083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45083/consoleFull)**
 for PR 9483 at commit 
[`63dc9c0`](https://github.com/apache/spark/commit/63dc9c04cc5d5fc9b815685dab1ba6d5811a999c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153956255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45083/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153956253
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153956188
  
**[Test build #45083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45083/consoleFull)**
 for PR 9483 at commit 
[`63dc9c0`](https://github.com/apache/spark/commit/63dc9c04cc5d5fc9b815685dab1ba6d5811a999c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class ParallelUnionRDD[T: ClassTag](`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-04 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-153974427
  
cc/ @scwf @Sephiroth-Lin, not sure if you guys get time for benchmarking 
this with the real world cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org