[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21110 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89667/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21110 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21110 **[Test build #89667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89667/testReport)** for PR 21110 at commit [`7c84e1d`](https://github.com/apache/spark/commit/7c84e1d4c5d9d3c90454a1060d12f3667809d71c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89672/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21119 **[Test build #89672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89672/testReport)** for PR 21119 at commit [`53d7763`](https://github.com/apache/spark/commit/53d7763b58d05a6baf9fcf1cef2ae327a5d42e04). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class _PowerIterationClusteringParams(JavaParams, HasMaxIter, HasPredictionCol):` * `class PowerIterationClustering(JavaTransformer, _PowerIterationClusteringParams, JavaMLReadable,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2559/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21119 **[Test build #89672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89672/testReport)** for PR 21119 at commit [`53d7763`](https://github.com/apache/spark/commit/53d7763b58d05a6baf9fcf1cef2ae327a5d42e04). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21119: [SPARK-19826][ML][PYTHON]add spark.ml Python API ...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21119 [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC ## What changes were proposed in this pull request? add spark.ml Python API for PIC ## How was this patch tested? add doctest You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark_19826 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21119.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21119 commit 53d7763b58d05a6baf9fcf1cef2ae327a5d42e04 Author: Huaxin GaoDate: 2018-04-21T04:15:37Z [SPARK-19826][ML][PYTHON]add spark.ml Python API for PIC --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21018 @cloud-fan @viirya could you check this? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2558/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #89671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89671/testReport)** for PR 21102 at commit [`cd56b7d`](https://github.com/apache/spark/commit/cd56b7dcecf8228cb92ac40e028ac35d028065f5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 @Ngone51 Ah, maybe I know how the description misleading you, the in the description 5, 'this stage' refers to 'Stage 2' in screenshot, thanks for your check, I modified the description to avoid misleading others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/20930 @Ngone51 You can check the screenshot in detail, stage 2's shuffleID is 1, but stage 3 failed by missing an output for shuffle '0'! So here the stage 2's skip cause stage 3 got an error shuffleId, the root case is this patch wants to fix, missing task should have, but actually not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/20930#discussion_r183198368 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1266,6 +1266,9 @@ class DAGScheduler( } if (failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)) { logInfo(s"Ignoring possibly bogus $smt completion from executor $execId") +} else if (failedStages.contains(shuffleStage)) { --- End diff -- >Sorry I may nitpick here. No, that's necessary, I should have to make sure about this, thanks for your advice! :) > Can you simulate what happens to result task if FechFaileded comes before task success? Sure, but it maybe hardly to reproduce this in real env, I'll try to fake it on UT first ASAP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21021 **[Test build #89670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89670/testReport)** for PR 21021 at commit [`172b2c5`](https://github.com/apache/spark/commit/172b2c520b6d8c3df6778ecb07ee6eca5b4b568d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2557/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21021 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89666/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21018 **[Test build #89666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89666/testReport)** for PR 21018 at commit [`80f3b34`](https://github.com/apache/spark/commit/80f3b34db7d4c9a49aba47a107975d33e6eab8dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CachedRDDBuilder(` * `case class InMemoryRelation(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2556/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21031 **[Test build #89669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89669/testReport)** for PR 21031 at commit [`dd46bbf`](https://github.com/apache/spark/commit/dd46bbf0379a52cf18315b88e4be374e0d8b1956). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20636 @hvanhovell When I added [the new check code](https://github.com/apache/spark/pull/20636/files#diff-e68c5a074209b9a20ee2aa42936571ceR64) to see whether the growth value is negative, we see the following error. Finally, `Integer.MAX_VALUE` is changed to negative value. How do we handle this? Should we pass `Integer.MAX_VALUE - n` (where n is 64 or something) instead of `Integer.MAX_VALUE`? WDYT? ``` Exception in thread "main" java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -2147483648 because the size is nevative at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:65) at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolderSparkSubmitSuite$.main(BufferHolderSparkSubmitSuite.scala:69) at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolderSparkSubmitSuite.main(BufferHolderSparkSubmitSuite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:838) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:913) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21021 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21031#discussion_r183196830 --- Diff: python/pyspark/sql/functions.py --- @@ -2124,6 +2124,21 @@ def size(col): return Column(sc._jvm.functions.size(_to_java_column(col))) +@since(2.4) +def cardinality(col): --- End diff -- I see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21117 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21117 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2555/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21117 **[Test build #89668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89668/testReport)** for PR 21117 at commit [`2c1575d`](https://github.com/apache/spark/commit/2c1575da5aa9ec1a7b3a8ec29457b46ecba3330e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21117 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89664/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89664/testReport)** for PR 20959 at commit [`257b363`](https://github.com/apache/spark/commit/257b3638ae0db7051dd25affcaf8967a5a29db5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21110 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21110 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2554/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21110 **[Test build #89667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89667/testReport)** for PR 21110 at commit [`7c84e1d`](https://github.com/apache/spark/commit/7c84e1d4c5d9d3c90454a1060d12f3667809d71c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21110: [SPARK-24029][core] Set SO_REUSEADDR on listen sockets.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21110 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20345 @cloud-fan @wzhfy ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183190383 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(("spark.sql.cbo.enabled", "true")) { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") --- End diff -- nit: you don't need the `spark.` prefix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183190432 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(("spark.sql.cbo.enabled", "true")) { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) +df2.createTempView("TBL2") +spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe') ").explain() --- End diff -- Why this `explain()` called? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183190234 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(("spark.sql.cbo.enabled", "true")) { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) +df2.createTempView("TBL2") +spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe') ").explain() + } +} + + } + --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183190221 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(("spark.sql.cbo.enabled", "true")) { + withTable("TBL1", "TBL") { +import org.apache.spark.sql.functions._ +val df = spark.range(1000L).select('id, + 'id * 2 as "FLD1", + 'id * 12 as "FLD2", + lit("aaa") + 'id as "fld3") +df.write + .mode(SaveMode.Overwrite) + .bucketBy(10, "id", "FLD1", "FLD2") + .sortBy("id", "FLD1", "FLD2") + .saveAsTable("TBL") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") +spark.sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") +val df2 = spark.sql( + """ + SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 + FROM tbl t1 + JOIN tbl t2 on t1.id=t2.id + WHERE t1.fld3 IN (-123.23,321.23) + """.stripMargin) +df2.createTempView("TBL2") +spark.sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe') ").explain() + } +} + --- End diff -- nit: drop this line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r183190136 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala --- @@ -382,4 +382,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("Simple queries must be working, if CBO is turned on") { +withSQLConf(("spark.sql.cbo.enabled", "true")) { --- End diff -- nit: `withSQLConf(SQLConf.CBO_ENABLED.key -> "true")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21040 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21040 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89657/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21040 **[Test build #89657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89657/testReport)** for PR 21040 at commit [`b94d067`](https://github.com/apache/spark/commit/b94d067d3358c96a638dbe5c4fbb7270def453c3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21052 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89656/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21052: [SPARK-23799][SQL] FilterEstimation.evaluateInSet produc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21052 **[Test build #89656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89656/testReport)** for PR 21052 at commit [`0faa789`](https://github.com/apache/spark/commit/0faa789a2e040c90c8add1ba93bd8618b1988d8a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20938: [SPARK-23821][SQL] Collection function: flatten
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20938 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20938: [SPARK-23821][SQL] Collection function: flatten
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20938 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20938: [SPARK-23821][SQL] Collection function: flatten
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20938 **[Test build #89658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89658/testReport)** for PR 20938 at commit [`939fc23`](https://github.com/apache/spark/commit/939fc238b4a5616a2b254640acc64703ac9b3cf1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2553/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21018 **[Test build #89666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89666/testReport)** for PR 21018 at commit [`80f3b34`](https://github.com/apache/spark/commit/80f3b34db7d4c9a49aba47a107975d33e6eab8dd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20965: [SPARK-21870][SQL] Split aggregation code into sm...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20965#discussion_r183185144 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -254,6 +256,80 @@ case class HashAggregateExec( """.stripMargin } + // Extracts all the input variable references for a given `aggExpr`. This result will be used + // to split aggregation into small functions. + private def getInputVariableReferences( + context: CodegenContext, + aggregateExpression: Expression, + subExprs: Map[Expression, SubExprEliminationState]): Seq[(String, String, Expression)] = { +// `argMap` collects all the pairs of variable names and their types, the first in the pair +// is a type name and the second is a variable name. +val argMap = mutable.Map[(String, String), Expression]() +val stack = mutable.Stack[Expression](aggregateExpression) +while (stack.nonEmpty) { + stack.pop() match { +case e if subExprs.contains(e) => + val exprCode = subExprs(e) + if (CodeGenerator.isJavaIdentifier(exprCode.value)) { +argMap += (CodeGenerator.javaType(e.dataType), exprCode.value) -> e + } + if (CodeGenerator.isJavaIdentifier(exprCode.isNull)) { +argMap += ("boolean", exprCode.isNull) -> e + } + // Since the children possibly has common expressions, we push them here + stack.pushAll(e.children) +case ref: BoundReference +if context.currentVars != null && context.currentVars(ref.ordinal) != null => + val value = context.currentVars(ref.ordinal).value + val isNull = context.currentVars(ref.ordinal).isNull + if (CodeGenerator.isJavaIdentifier(value)) { +argMap += (CodeGenerator.javaType(ref.dataType), value) -> ref + } + if (CodeGenerator.isJavaIdentifier(isNull)) { +argMap += ("boolean", isNull) -> ref + } +case ref: BoundReference => + argMap += ("InternalRow", context.INPUT_ROW) -> ref +case e => + stack.pushAll(e.children) + } +} + +argMap.map { case ((tpe, name), e) => (tpe, name, e) }.toSeq + } + + // Splits aggregate code into small functions because JVMs does not compile too long functions + private def splitAggregateExpressions( --- End diff -- ok, I'll recheck. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89650/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20146 **[Test build #89650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89650/testReport)** for PR 20146 at commit [`50af02e`](https://github.com/apache/spark/commit/50af02eaccce7cecb7c3093d5bc14675ca860c22). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...
Github user aa8y commented on the issue: https://github.com/apache/spark/pull/20068 I apologize I haven't had time to work on this. I can close this for now and reopen it when I have a working fix for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21117 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89665/testReport)** for PR 21118 at commit [`6006123`](https://github.com/apache/spark/commit/60061234d2f627755d9c946da410163de5458feb). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89665/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21117 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89655/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21117: [followup][SPARK-10399][SPARK-23879][Core] Free unused o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21117 **[Test build #89655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89655/testReport)** for PR 21117 at commit [`2c1575d`](https://github.com/apache/spark/commit/2c1575da5aa9ec1a7b3a8ec29457b46ecba3330e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2552/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21118 **[Test build #89665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89665/testReport)** for PR 21118 at commit [`6006123`](https://github.com/apache/spark/commit/60061234d2f627755d9c946da410163de5458feb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20068 ping @aa8y @HyukjinKwon @MaxGekk @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21116: [SPARK-24038][SS] Refactor continuous writing to its own...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21116 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89654/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21116: [SPARK-24038][SS] Refactor continuous writing to its own...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21116 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21116: [SPARK-24038][SS] Refactor continuous writing to its own...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21116 **[Test build #89654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89654/testReport)** for PR 21116 at commit [`3d8dfa4`](https://github.com/apache/spark/commit/3d8dfa415902d1d7be45a36923d2d355936eefbe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89651/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #89651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89651/testReport)** for PR 21103 at commit [`ad9f576`](https://github.com/apache/spark/commit/ad9f576961384936548adbd38dfb60296d3f0389). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89652/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21118 Yeah, we should probably add a projection. It's probably only working because the InternalRows that are produced are all UnsafeRow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21108: [SPARK-24027][SQL] Support MapType with StringType for k...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21108 **[Test build #89652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89652/testReport)** for PR 21108 at commit [`7bfe231`](https://github.com/apache/spark/commit/7bfe23180d9dd4c2717d65f862b8a6fc13f7b22a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20636 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20636 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89653/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20636 **[Test build #89653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89653/testReport)** for PR 20636 at commit [`f946631`](https://github.com/apache/spark/commit/f946631dd775becf510edfeb128c2d07d41eff39). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89647/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21069 **[Test build #89647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89647/testReport)** for PR 21069 at commit [`60e8e2f`](https://github.com/apache/spark/commit/60e8e2f49b358cf29750b941c139c13177531e51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21115: [SPARK-24033] [SQL] Fix Mismatched of Window Frame speci...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21115 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21115: [SPARK-24033] [SQL] Fix Mismatched of Window Frame speci...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21115 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89648/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21115: [SPARK-24033] [SQL] Fix Mismatched of Window Frame speci...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21115 **[Test build #89648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89648/testReport)** for PR 21115 at commit [`8e4c921`](https://github.com/apache/spark/commit/8e4c92171dfebb063932764b5fecc6ec7f3de0a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21118 Generally looks good. IIRC, there's some arcane reason why plan nodes need to produce UnsafeRow even though SparkPlan.execute() declares InternalRow. So we may need to add a projection in DataSourceV2ScanExec. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89664/testReport)** for PR 20959 at commit [`257b363`](https://github.com/apache/spark/commit/257b3638ae0db7051dd25affcaf8967a5a29db5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20933 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89659/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20933 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20933 **[Test build #89659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89659/testReport)** for PR 20933 at commit [`359f846`](https://github.com/apache/spark/commit/359f846112ba8c7ee9023b7754da4a907068b39b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FallBackFileDataSourceToV1(sparkSession: SparkSession) extends Rule[LogicalPlan] ` * `abstract class FileDataSourceV2 extends DataSourceV2 ` * `class OrcDataSourceV2 extends FileDataSourceV2 with ReadSupport with ReadSupportWithSchema ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21070 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21070 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org