[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21760 **[Test build #92971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92971/testReport)** for PR 21760 at commit [`26b88ca`](https://github.com/apache/spark/commit/26b88ca201a70283528f289cdd2e1e216fce6e7a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/924/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21760 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtil...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21760 [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and replace deprecated methods ## What changes were proposed in this pull request? Improve Avro unit test: 1. use QueryTest/SharedSQLContext/SQLTestUtils, instead of the duplicated test utils. 2. replace deprecated methods ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark improve_avro_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21760.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21760 commit 26b88ca201a70283528f289cdd2e1e216fce6e7a Author: Gengliang Wang Date: 2018-07-13T11:41:56Z improve AvroSuite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92964/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21745 **[Test build #92964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92964/testReport)** for PR 21745 at commit [`9e00db9`](https://github.com/apache/spark/commit/9e00db938ddc6293899170e19b41530b22fb525a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21759: sfas
Github user marymwu closed the pull request at: https://github.com/apache/spark/pull/21759 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21759: sfas
GitHub user marymwu opened a pull request: https://github.com/apache/spark/pull/21759 sfas ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marymwu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21759 commit dcf36ad54598118408c1425e81aa6552f42328c8 Author: Dongjoon Hyun Date: 2016-05-03T13:02:04Z [SPARK-15057][GRAPHX] Remove stale TODO comment for making `enum` in GraphGenerators This PR removes a stale TODO comment in `GraphGenerators.scala` Just comment removed. Author: Dongjoon Hyun Closes #12839 from dongjoon-hyun/SPARK-15057. (cherry picked from commit 46965cd014fd4ba68bdec15156ec9bcc27d9b217) Signed-off-by: Reynold Xin commit 1dc30f189ac30f070068ca5f60b7b4c85f2adc9e Author: Bryan Cutler Date: 2016-05-19T02:48:36Z [DOC][MINOR] ml.feature Scala and Python API sync I reviewed Scala and Python APIs for ml.feature and corrected discrepancies. Built docs locally, ran style checks Author: Bryan Cutler Closes #13159 from BryanCutler/ml.feature-api-sync. (cherry picked from commit b1bc5ebdd52ed12aea3fdc7b8f2fa2d00ea09c6b) Signed-off-by: Reynold Xin commit 642f00980f1de13a0f6d1dc8bc7ed5b0547f3a9d Author: Zheng RuiFeng Date: 2016-05-15T14:59:49Z [MINOR] Fix Typos 1,Rename matrix args in BreezeUtil to upper to match the doc 2,Fix several typos in ML and SQL manual tests Author: Zheng RuiFeng Closes #13078 from zhengruifeng/fix_ann. (cherry picked from commit c7efc56c7b6fc99c005b35c335716ff676856c6c) Signed-off-by: Reynold Xin commit 2126fb0c2b2bb8ac4c5338df15182fcf8713fb2f Author: Sandeep Singh Date: 2016-05-19T09:44:26Z [CORE][MINOR] Remove redundant set master in OutputCommitCoordinatorIntegrationSuite Remove redundant set master in OutputCommitCoordinatorIntegrationSuite, as we are already setting it in SparkContext below on line 43. existing tests Author: Sandeep Singh Closes #13168 from techaddict/minor-1. (cherry picked from commit 3facca5152e685d9c7da96bff5102169740a4a06) Signed-off-by: Reynold Xin commit 1fc0f95eb8abbb9cc8ede2139670e493e6939317 Author: Andrew Or Date: 2016-05-20T05:40:03Z [HOTFIX] Test compilation error from 52b967f commit dd0c7fb39cac44e8f0d73f9884fd1582c25e9cf4 Author: Reynold Xin Date: 2016-05-20T05:46:08Z Revert "[HOTFIX] Test compilation error from 52b967f" This reverts commit 1fc0f95eb8abbb9cc8ede2139670e493e6939317. commit f8d0177c31d43eab59a7535945f3dfa24e906273 Author: Davies Liu Date: 2016-05-18T23:02:52Z Revert "[SPARK-15392][SQL] fix default value of size estimation of logical plan" This reverts commit fc29b896dae08b957ed15fa681b46162600a4050. (cherry picked from commit 84b23453ddb0a97e3d81306de0a5dcb64f88bdd0) Signed-off-by: Reynold Xin commit 2ef645724a7f229309a87c5053b0fbdf45d06f52 Author: Takuya UESHIN Date: 2016-05-20T05:55:44Z [SPARK-15313][SQL] EmbedSerializerInFilter rule should keep exprIds of output of surrounded SerializeFromObject. ## What changes were proposed in this pull request? The following code: ``` val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS() ds.filter(_._1 == "b").select(expr("_1").as[String]).foreach(println(_)) ``` throws an Exception: ``` org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _1#420 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) ... Cause: java.lang.RuntimeException: Couldn't find _1#420 in [_1#416,_2#417] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:94) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202302865 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: // See SPARK-20364. def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) && !name.contains(".") +// All DataTypes that support `makeEq` can provide better performance. +def shouldConvertInPredicate(name: String): Boolean = nameToType(name) match { --- End diff -- @HyukjinKwon How about remove this? `Timestamp` type and `Decimal` type will be support soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @srowen Thanks for the review. all comments has been addressed from my side. let me know for any clarifications --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/923/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #92970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92970/testReport)** for PR 21102 at commit [`fce9eb0`](https://github.com/apache/spark/commit/fce9eb09bf0666711dbb5584c56b2534e495dffc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21505: [SPARK-24457][SQL] Improving performance of stringToTime...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21505 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21505: [SPARK-24457][SQL] Improving performance of stringToTime...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21505 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92969/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21505: [SPARK-24457][SQL] Improving performance of stringToTime...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21505 **[Test build #92969 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92969/testReport)** for PR 21505 at commit [`c940381`](https://github.com/apache/spark/commit/c940381a0be36fd227e8f63caf32d3be86c5aa69). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21505: [SPARK-24457][SQL] Improving performance of stringToTime...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21505 **[Test build #92969 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92969/testReport)** for PR 21505 at commit [`c940381`](https://github.com/apache/spark/commit/c940381a0be36fd227e8f63caf32d3be86c5aa69). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21505: [SPARK-24457][SQL] Improving performance of stringToTime...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21505 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19789 @daroo, mind reopening this if you have some time to update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19789 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18113: [SPARK-20890][SQL] Added min and max typed aggregation f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18113 @setjet, mind updating this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202287558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED = +buildConf("spark.sql.parquet.filterPushdown.timestamp") + .doc("If true, enables Parquet filter push-down optimization for Timestamp. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is " + +"enabled and Timestamp stored as TIMESTAMP_MICROS or TIMESTAMP_MILLIS type.") --- End diff -- ... I don't think users will understand any of them .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202286983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -386,6 +386,17 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_INFILTERTHRESHOLD = +buildConf("spark.sql.parquet.pushdown.inFilterThreshold") + .doc("The maximum number of values to filter push-down optimization for IN predicate. " + +"Large threshold won't necessarily provide much better performance. " + +"The experiment argued that 300 is the limit threshold. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is enabled.") + .internal() + .intConf + .checkValue(threshold => threshold > 0, "The threshold must be greater than 0.") --- End diff -- Yup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202286636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -386,6 +386,17 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_INFILTERTHRESHOLD = +buildConf("spark.sql.parquet.pushdown.inFilterThreshold") + .doc("The maximum number of values to filter push-down optimization for IN predicate. " + +"Large threshold won't necessarily provide much better performance. " + +"The experiment argued that 300 is the limit threshold. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is enabled.") + .internal() + .intConf + .checkValue(threshold => threshold > 0, "The threshold must be greater than 0.") --- End diff -- Let's use `-1`. Seems that's more consistent in the configurations. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20915 @cloud-fan, how does it relate to SPARK-23803, SPARK-12850 and SPARK-23507? Was about to take an action to the JIRAs but felt better making sure ahead. SPARK-12850 was merged in 2.0.0 but reverted by SPARK-14535 in 2.0.0 so it's no problem but SPARK-23803 duplicates SPARK-12850. and .. you plan to migrate file-based source to datasource v2 which includes refactoring this feature? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202283085 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -386,6 +386,17 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_INFILTERTHRESHOLD = +buildConf("spark.sql.parquet.pushdown.inFilterThreshold") + .doc("The maximum number of values to filter push-down optimization for IN predicate. " + +"Large threshold won't necessarily provide much better performance. " + +"The experiment argued that 300 is the limit threshold. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is enabled.") + .internal() + .intConf + .checkValue(threshold => threshold > 0, "The threshold must be greater than 0.") --- End diff -- ```scala case sources.In(name, values) if canMakeFilterOn(name) && shouldConvertInPredicate(name) && values.distinct.length <= pushDownInFilterThreshold => ``` How about `0`. `values.distinct.length` will not be less than `0`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21386: [SPARK-23928][SQL][WIP] Add shuffle collection function.
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21386 @pkuwm Hi, any updates on this? If you have any questions, please let us know. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/922/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21704 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21704 **[Test build #92967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92967/testReport)** for PR 21704 at commit [`5115961`](https://github.com/apache/spark/commit/5115961fb0503cabbdbdead7c29c1521ab4f76cb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #92968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92968/testReport)** for PR 20611 at commit [`bee161f`](https://github.com/apache/spark/commit/bee161f07ae4f76a0f090f64ac84c39f752652ce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix type coercions and nullabi...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r202278265 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -259,8 +270,22 @@ object TypeCoercion { } } - private def haveSameType(exprs: Seq[Expression]): Boolean = -exprs.map(_.dataType).distinct.length == 1 + private def haveSameType(exprs: Seq[Expression]): Boolean = { --- End diff -- Since we have `CreateMap`, we can't make all such expressions `ComplexTypeMergingExpression`. I'd apply 2) approach. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202277812 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -578,3 +578,127 @@ Native ORC Vectorized 11622 / 12196 1.4 7 Native ORC Vectorized (Pushdown)11377 / 11654 1.4 723.3 1.0X + +Pushdown benchmark for Timestamp + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +Select 1 timestamp stored as INT96 row (value = CAST(7864320 AS timestamp)): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative --- End diff -- OK. I'll send a follow-up PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202277658 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -517,7 +585,6 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } - --- End diff -- OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202277483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED = +buildConf("spark.sql.parquet.filterPushdown.timestamp") + .doc("If true, enables Parquet filter push-down optimization for Timestamp. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is " + +"enabled and Timestamp stored as TIMESTAMP_MICROS or TIMESTAMP_MILLIS type.") --- End diff -- I think end users have a better understanding of `TIMESTAMP_MICROS` and `TIMESTAMP_MILLIS`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21698 @jiangxb1987 data loss comes because a re-execution of zip might generate a key for which corresponding reducer has already finished. Hence re-execution of stage will not result in subsequent child stage's reducer partition getting re-executed : resulting in data loss. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21698 @cloud-fan That depends on what the computeKey is doing - which is user defined. It can have different values, or it need not (again, depends on user data and closure being applied). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202271473 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -579,6 +579,18 @@ class CodegenContext { s"${fullName}_$id" } + /** + * Creates an `ExprValue` representing a local java variable of required data type. + */ + def freshVariable(name: String, dt: DataType): VariableValue = +JavaCode.variable(freshName(name), dt) + + /** + * Creates an `ExprValue` representing a local java variable of required data type. --- End diff -- nit: `data type` -> `Java class` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202269577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -720,31 +719,36 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) ctx.addNewFunction(funcName, --- End diff -- Since this method `dataToStringFunc()` is not used in other files, it would be good to address it in this PR. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92965/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21758 **[Test build #92965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92965/testReport)** for PR 21758 at commit [`c8d67e4`](https://github.com/apache/spark/commit/c8d67e434426d6f7c0b6ff4a9899096e40355325). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #92966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92966/testReport)** for PR 20611 at commit [`7900aaf`](https://github.com/apache/spark/commit/7900aaf7913a1b95527568ce54ff40f8a0c69148). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92966/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202261810 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -517,7 +585,6 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } - --- End diff -- nit: I would revert this change if you are going to push more changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21758 **[Test build #92965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92965/testReport)** for PR 21758 at commit [`c8d67e4`](https://github.com/apache/spark/commit/c8d67e434426d6f7c0b6ff4a9899096e40355325). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #92966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92966/testReport)** for PR 20611 at commit [`7900aaf`](https://github.com/apache/spark/commit/7900aaf7913a1b95527568ce54ff40f8a0c69148). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/921/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202261386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED = +buildConf("spark.sql.parquet.filterPushdown.timestamp") + .doc("If true, enables Parquet filter push-down optimization for Timestamp. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is " + +"enabled and Timestamp stored as TIMESTAMP_MICROS or TIMESTAMP_MILLIS type.") --- End diff -- Shell we note `INT64` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21565: [SPARK-24558][Core]wrong Idle Timeout value is used in c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21565 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20100: [SPARK-22913][SQL] Improved Hive Partition Pruning
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20100 @ameent BTW, we can't directly close this. I'd appreciate it if you manually close this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/920/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20100: [SPARK-22913][SQL] Improved Hive Partition Pruning
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20100 Sorry for a late response. I am now checking PRs queued in my list. I agree with @cloud-fan's for now and I think we should better leave this closed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21741: [SPARK-24718][SQL] Timestamp support pushdown to parquet...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21741 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20057 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92963/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20057 **[Test build #92963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92963/testReport)** for PR 20057 at commit [`bc75051`](https://github.com/apache/spark/commit/bc75051f5f4a47ef045c93bd933b2f95635100ad). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20057 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202260518 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -578,3 +578,127 @@ Native ORC Vectorized 11622 / 12196 1.4 7 Native ORC Vectorized (Pushdown)11377 / 11654 1.4 723.3 1.0X + +Pushdown benchmark for Timestamp + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +Select 1 timestamp stored as INT96 row (value = CAST(7864320 AS timestamp)): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative --- End diff -- shall we add a new line after the benchmark name? e.g. ``` Select 1 timestamp stored as INT96 row (value = CAST(7864320 AS timestamp)): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative ... ``` We can send a follow-up PR to fix this entire file. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r202259987 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -129,14 +129,14 @@ private[sql] class HiveSessionCatalog( Try(super.lookupFunction(funcName, children)) match { case Success(expr) => expr case Failure(error) => -if (functionRegistry.functionExists(funcName)) { - // If the function actually exists in functionRegistry, it means that there is an - // error when we create the Expression using the given children. +if (super.functionExists(name)) { + // If the function actually exists in functionRegistry or externalCatalog, + // it means that there is an error when we create the Expression using the given children. // We need to throw the original exception. throw error } else { - // This function is not in functionRegistry, let's try to load it as a Hive's - // built-in function. + // This function is not in functionRegistry or externalCatalog, + // let's try to load it as a Hive's built-in function. // Hive is case insensitive. val functionName = funcName.unquotedString.toLowerCase(Locale.ROOT) if (!hiveFunctions.contains(functionName)) { --- End diff -- We do not need to change the other parts. We just need to throw the exception in `failFunctionLookup(funcName)`, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r202257183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1155,7 +1155,8 @@ class Analyzer( override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { case f: UnresolvedFunction if !catalog.functionExists(f.name) => withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) + val db = f.name.database.getOrElse(catalog.getCurrentDatabase) + throw new NoSuchFunctionException(db, f.name.funcName) --- End diff -- The issue has been resolved. Can you revert the changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21745 **[Test build #92964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92964/testReport)** for PR 21745 at commit [`9e00db9`](https://github.com/apache/spark/commit/9e00db938ddc6293899170e19b41530b22fb525a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21745 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92961/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21745 **[Test build #92961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92961/testReport)** for PR 21745 at commit [`9e00db9`](https://github.com/apache/spark/commit/9e00db938ddc6293899170e19b41530b22fb525a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org