[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-09 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-182067683
  
LGTM, merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10917


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52116834
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -70,7 +68,14 @@ object HashBenchmark {
 
   def main(args: Array[String]): Unit = {
 val simple = new StructType().add("i", IntegerType)
-test("simple", simple, 1024)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For simple:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+interpreted version   941 /  955142.6  
 7.0   1.0X
+codegen version  1737 / 1775 77.3  
12.9   0.5X
--- End diff --

see the explanation in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-181028221
  
**[Test build #50899 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50899/consoleFull)**
 for PR 10917 at commit 
[`9a1f8ff`](https://github.com/apache/spark/commit/9a1f8ff85956485a2a7798996ede514045597c26).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-181028502
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50899/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-181028501
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-181011965
  
**[Test build #50899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50899/consoleFull)**
 for PR 10917 at commit 
[`9a1f8ff`](https://github.com/apache/spark/commit/9a1f8ff85956485a2a7798996ede514045597c26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52113377
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -70,7 +68,14 @@ object HashBenchmark {
 
   def main(args: Array[String]): Unit = {
 val simple = new StructType().add("i", IntegerType)
-test("simple", simple, 1024)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For simple:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+interpreted version   941 /  955142.6  
 7.0   1.0X
+codegen version  1737 / 1775 77.3  
12.9   0.5X
--- End diff --

?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52111265
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -87,18 +92,39 @@ object HashBenchmark {
   .add("binary", BinaryType)
   .add("date", DateType)
   .add("timestamp", TimestampType)
-test("normal", normal, 128)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For normal:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+interpreted version  2209 / 2271  0.9  
  1053.4   1.0X
+codegen version  1887 / 2018  1.1  
   899.9   1.2X
--- End diff --

codegen version is 20% faster, because it doesn't have runtime reflection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52113370
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -87,18 +92,39 @@ object HashBenchmark {
   .add("binary", BinaryType)
   .add("date", DateType)
   .add("timestamp", TimestampType)
-test("normal", normal, 128)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For normal:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+interpreted version  2209 / 2271  0.9  
  1053.4   1.0X
+codegen version  1887 / 2018  1.1  
   899.9   1.2X
--- End diff --

Sorry, I read it wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52113367
  
--- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala ---
@@ -124,7 +124,7 @@ private[spark] object Benchmark {
 }
 val best = runTimes.min
 val avg = runTimes.sum / iters
-Result(avg / 100, num / (best / 1000), best / 100)
+Result(avg / 100, num.toDouble / (best / 1000), best / 100)
--- End diff --

Hot fixed in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52098713
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -87,18 +92,39 @@ object HashBenchmark {
   .add("binary", BinaryType)
   .add("date", DateType)
   .add("timestamp", TimestampType)
-test("normal", normal, 128)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For normal:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+interpreted version  2209 / 2271  0.9  
  1053.4   1.0X
+codegen version  1887 / 2018  1.1  
   899.9   1.2X
--- End diff --

Why the generated version is slower?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180519162
  
cc @cloud-fan on follow-up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r52095963
  
--- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala ---
@@ -124,7 +124,7 @@ private[spark] object Benchmark {
 }
 val best = runTimes.min
 val avg = runTimes.sum / iters
-Result(avg / 100, num / (best / 1000), best / 100)
+Result(avg / 100, num.toDouble / (best / 1000), best / 100)
--- End diff --

We need to keep the precision here.  The `bestMs/avgMs` can be well 
controlled in an appropriate number, but the `rate` can't. And we use `rate` as 
a divisor later, so if `rate` is small(assume we are benchmarking some slow 
operations), we will get large deviation. BTW, we use `%10.1f` to print `rate` 
but previously `rate` is always integral.

cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180661399
  
**[Test build #50858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50858/consoleFull)**
 for PR 10917 at commit 
[`315af8c`](https://github.com/apache/spark/commit/315af8cdb87a8033e86f2c6c318b615f79bfc3da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180660844
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180660845
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50856/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180679947
  
**[Test build #50858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50858/consoleFull)**
 for PR 10917 at commit 
[`315af8c`](https://github.com/apache/spark/commit/315af8cdb87a8033e86f2c6c318b615f79bfc3da).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180680058
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-180680059
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50858/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179094519
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50653/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179094513
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179094018
  
**[Test build #50653 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50653/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-03 Thread nongli
Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10917#discussion_r51771845
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala ---
@@ -87,18 +94,39 @@ object HashBenchmark {
   .add("binary", BinaryType)
   .add("date", DateType)
   .add("timestamp", TimestampType)
-test("normal", normal, 128)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For normal:   Avg Time(ms)Avg Rate(M/s)  
Relative Rate
+
---
+interpreted version 2187.63 0.96   
  1.00 X
+codegen version 1693.21 1.24   
  1.29 X
+ */
+test("normal", normal, 1024 * 4)
 
 val arrayOfInt = ArrayType(IntegerType)
 val array = new StructType()
   .add("array", arrayOfInt)
   .add("arrayOfArray", ArrayType(arrayOfInt))
-test("array", array, 64)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For array:Avg Time(ms)Avg Rate(M/s)  
Relative Rate
+
---
+interpreted version 3290.06 0.08   
  1.00 X
+codegen version 6674.07 0.04   
  0.49 X
+ */
+test("array", array, 512)
 
 val mapOfInt = MapType(IntegerType, IntegerType)
 val map = new StructType()
   .add("map", mapOfInt)
   .add("mapOfMap", MapType(IntegerType, mapOfInt))
-test("map", map, 64)
+/*
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Hash For map:  Avg Time(ms)Avg Rate(M/s)  
Relative Rate
+
---
+interpreted version64709.73 0.00   
  1.00 X
--- End diff --

How long does this benchmark take to run? THis looks really long. I think 
we should keep benchmarks to run in low number of seconds total if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179021435
  
**[Test build #50640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50640/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179021446
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179021452
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50640/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179022386
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179024646
  
**[Test build #50642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50642/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179009533
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179017448
  
**[Test build #50640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50640/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179036664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50642/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179035992
  
**[Test build #50642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50642/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179036657
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179058793
  
So many flaky tests...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179058803
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-179063245
  
**[Test build #50653 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50653/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-178959044
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-178959049
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50634/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-178959023
  
**[Test build #50634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50634/consoleFull)**
 for PR 10917 at commit 
[`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-26 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-175137240
  
LGTM. We can have different hash functions with different entropy later but 
this seems okay to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-26 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-175296051
  
The map case should not be that faster for the codegen version, because of 
a generator bug that has been fixed at 
https://github.com/apache/spark/pull/10930. Should we merge this one first and 
update map case result after #10930 is merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-26 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-175297221
  
No, let's re-run them when the results are easier to explain. Can you also 
tune the iterations so that the iterations is a higher value. The harness does 
some rounding with the less significant digits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174849282
  
**[Test build #50072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50072/consoleFull)**
 for PR 10917 at commit 
[`8207dc1`](https://github.com/apache/spark/commit/8207dc109f21527438cbd80894e9b49d63159f12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174874774
  
**[Test build #50072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50072/consoleFull)**
 for PR 10917 at commit 
[`8207dc1`](https://github.com/apache/spark/commit/8207dc109f21527438cbd80894e9b49d63159f12).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174874622
  
@nongli  It's not doing anything to get the hash code of int field, but do 
a [simple multiplication and 
addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153)
 to get the hash code of the row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174866182
  
@cloud-fan Simple is just a single int right? It's not even doing anything 
in the previous case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174846863
  
cc @nongli @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/10917

[SPARK-12888][SQL][follow-up] benchmark the new hash expression

Adds the benchmark results as comments.

The codegen version is slower than the interpreted version for `simple` 
case becasue of 3 reasons:

1. codegen version use a more complex hash algorithm than interpreted 
version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and 
addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153).
2. codegen version will write the hash value to a row first and then read 
it out. I tried to create a `GenerateHasher` that can generate code to return 
hash value directly and got about 60% speed up for the `simple` case, does it 
worth?
3. the row in `simple` case only has one int field, so the runtime 
reflection may be removed because of branch prediction, which makes the 
interpreted version faster.

The `array` case is also slow for similar reasons, e.g. array elements are 
of same type, so interpreted version can probably get rid of runtime reflection 
by branch prediction.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark hash-benchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10917


commit 8207dc109f21527438cbd80894e9b49d63159f12
Author: Wenchen Fan 
Date:   2016-01-26T02:24:38Z

add benchmark results




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174875252
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174875254
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50072/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174847221
  
@nongli maybe we should just use the simpler multiplication and addition?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org