[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-182067683 LGTM, merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10917 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52116834 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala --- @@ -70,7 +68,14 @@ object HashBenchmark { def main(args: Array[String]): Unit = { val simple = new StructType().add("i", IntegerType) -test("simple", simple, 1024) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For simple:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +interpreted version 941 / 955142.6 7.0 1.0X +codegen version 1737 / 1775 77.3 12.9 0.5X --- End diff -- see the explanation in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-181028221 **[Test build #50899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50899/consoleFull)** for PR 10917 at commit [`9a1f8ff`](https://github.com/apache/spark/commit/9a1f8ff85956485a2a7798996ede514045597c26). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-181028502 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50899/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-181028501 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-181011965 **[Test build #50899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50899/consoleFull)** for PR 10917 at commit [`9a1f8ff`](https://github.com/apache/spark/commit/9a1f8ff85956485a2a7798996ede514045597c26). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52113377 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala --- @@ -70,7 +68,14 @@ object HashBenchmark { def main(args: Array[String]): Unit = { val simple = new StructType().add("i", IntegerType) -test("simple", simple, 1024) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For simple:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +interpreted version 941 / 955142.6 7.0 1.0X +codegen version 1737 / 1775 77.3 12.9 0.5X --- End diff -- ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52111265 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala --- @@ -87,18 +92,39 @@ object HashBenchmark { .add("binary", BinaryType) .add("date", DateType) .add("timestamp", TimestampType) -test("normal", normal, 128) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For normal:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +interpreted version 2209 / 2271 0.9 1053.4 1.0X +codegen version 1887 / 2018 1.1 899.9 1.2X --- End diff -- codegen version is 20% faster, because it doesn't have runtime reflection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52113370 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala --- @@ -87,18 +92,39 @@ object HashBenchmark { .add("binary", BinaryType) .add("date", DateType) .add("timestamp", TimestampType) -test("normal", normal, 128) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For normal:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +interpreted version 2209 / 2271 0.9 1053.4 1.0X +codegen version 1887 / 2018 1.1 899.9 1.2X --- End diff -- Sorry, I read it wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52113367 --- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala --- @@ -124,7 +124,7 @@ private[spark] object Benchmark { } val best = runTimes.min val avg = runTimes.sum / iters -Result(avg / 100, num / (best / 1000), best / 100) +Result(avg / 100, num.toDouble / (best / 1000), best / 100) --- End diff -- Hot fixed in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52098713 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala --- @@ -87,18 +92,39 @@ object HashBenchmark { .add("binary", BinaryType) .add("date", DateType) .add("timestamp", TimestampType) -test("normal", normal, 128) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For normal:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +interpreted version 2209 / 2271 0.9 1053.4 1.0X +codegen version 1887 / 2018 1.1 899.9 1.2X --- End diff -- Why the generated version is slower? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180519162 cc @cloud-fan on follow-up --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r52095963 --- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala --- @@ -124,7 +124,7 @@ private[spark] object Benchmark { } val best = runTimes.min val avg = runTimes.sum / iters -Result(avg / 100, num / (best / 1000), best / 100) +Result(avg / 100, num.toDouble / (best / 1000), best / 100) --- End diff -- We need to keep the precision here. The `bestMs/avgMs` can be well controlled in an appropriate number, but the `rate` can't. And we use `rate` as a divisor later, so if `rate` is small(assume we are benchmarking some slow operations), we will get large deviation. BTW, we use `%10.1f` to print `rate` but previously `rate` is always integral. cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180661399 **[Test build #50858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50858/consoleFull)** for PR 10917 at commit [`315af8c`](https://github.com/apache/spark/commit/315af8cdb87a8033e86f2c6c318b615f79bfc3da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180660844 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180660845 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50856/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180679947 **[Test build #50858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50858/consoleFull)** for PR 10917 at commit [`315af8c`](https://github.com/apache/spark/commit/315af8cdb87a8033e86f2c6c318b615f79bfc3da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180680058 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-180680059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179094519 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50653/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179094513 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179094018 **[Test build #50653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50653/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10917#discussion_r51771845 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashBenchmark.scala --- @@ -87,18 +94,39 @@ object HashBenchmark { .add("binary", BinaryType) .add("date", DateType) .add("timestamp", TimestampType) -test("normal", normal, 128) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For normal: Avg Time(ms)Avg Rate(M/s) Relative Rate + --- +interpreted version 2187.63 0.96 1.00 X +codegen version 1693.21 1.24 1.29 X + */ +test("normal", normal, 1024 * 4) val arrayOfInt = ArrayType(IntegerType) val array = new StructType() .add("array", arrayOfInt) .add("arrayOfArray", ArrayType(arrayOfInt)) -test("array", array, 64) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For array:Avg Time(ms)Avg Rate(M/s) Relative Rate + --- +interpreted version 3290.06 0.08 1.00 X +codegen version 6674.07 0.04 0.49 X + */ +test("array", array, 512) val mapOfInt = MapType(IntegerType, IntegerType) val map = new StructType() .add("map", mapOfInt) .add("mapOfMap", MapType(IntegerType, mapOfInt)) -test("map", map, 64) +/* +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Hash For map: Avg Time(ms)Avg Rate(M/s) Relative Rate + --- +interpreted version64709.73 0.00 1.00 X --- End diff -- How long does this benchmark take to run? THis looks really long. I think we should keep benchmarks to run in low number of seconds total if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179021435 **[Test build #50640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50640/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179021446 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179021452 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50640/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179022386 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179024646 **[Test build #50642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50642/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179009533 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179017448 **[Test build #50640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50640/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179036664 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50642/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179035992 **[Test build #50642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50642/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179036657 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179058793 So many flaky tests... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179058803 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-179063245 **[Test build #50653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50653/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-178959044 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-178959049 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50634/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-178959023 **[Test build #50634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50634/consoleFull)** for PR 10917 at commit [`4104d80`](https://github.com/apache/spark/commit/4104d80bb4a815fa816b5d4ee2123d13e7e53bdf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-175137240 LGTM. We can have different hash functions with different entropy later but this seems okay to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-175296051 The map case should not be that faster for the codegen version, because of a generator bug that has been fixed at https://github.com/apache/spark/pull/10930. Should we merge this one first and update map case result after #10930 is merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-175297221 No, let's re-run them when the results are easier to explain. Can you also tune the iterations so that the iterations is a higher value. The harness does some rounding with the less significant digits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174849282 **[Test build #50072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50072/consoleFull)** for PR 10917 at commit [`8207dc1`](https://github.com/apache/spark/commit/8207dc109f21527438cbd80894e9b49d63159f12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174874774 **[Test build #50072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50072/consoleFull)** for PR 10917 at commit [`8207dc1`](https://github.com/apache/spark/commit/8207dc109f21527438cbd80894e9b49d63159f12). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174874622 @nongli It's not doing anything to get the hash code of int field, but do a [simple multiplication and addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153) to get the hash code of the row. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174866182 @cloud-fan Simple is just a single int right? It's not even doing anything in the previous case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174846863 cc @nongli @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/10917 [SPARK-12888][SQL][follow-up] benchmark the new hash expression Adds the benchmark results as comments. The codegen version is slower than the interpreted version for `simple` case becasue of 3 reasons: 1. codegen version use a more complex hash algorithm than interpreted version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153). 2. codegen version will write the hash value to a row first and then read it out. I tried to create a `GenerateHasher` that can generate code to return hash value directly and got about 60% speed up for the `simple` case, does it worth? 3. the row in `simple` case only has one int field, so the runtime reflection may be removed because of branch prediction, which makes the interpreted version faster. The `array` case is also slow for similar reasons, e.g. array elements are of same type, so interpreted version can probably get rid of runtime reflection by branch prediction. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark hash-benchmark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10917.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10917 commit 8207dc109f21527438cbd80894e9b49d63159f12 Author: Wenchen FanDate: 2016-01-26T02:24:38Z add benchmark results --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174875252 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174875254 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50072/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10917#issuecomment-174847221 @nongli maybe we should just use the simpler multiplication and addition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org