[GitHub] spark issue #20596: [SPARK-23404][CORE]When the underlying buffers are direc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20596 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20596: [SPARK-23404][CORE]When the underlying buffers are direc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/840/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20596: [SPARK-23404][CORE]When the underlying buffers ar...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/20596 [SPARK-23404][CORE]When the underlying buffers are direct, we should copy them to the heap memory ## What changes were proposed in this pull request? If the memory mode is `ON_HEAP`,when the underlying buffers are direct, we should copy them to the heap memory. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark directtooffheap Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20596.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20596 commit 1f5d5ffbfe20c159fcf56d67ec230b05b06046a1 Author: liuxian Date: 2018-02-13T07:36:08Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87372/testReport)** for PR 20382 at commit [`f3fc90c`](https://github.com/apache/spark/commit/f3fc90cc94210f313861625b5a8fe6ef754c05bd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/839/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20546 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87361/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20546 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20546 **[Test build #87361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87361/testReport)** for PR 20546 at commit [`543caf8`](https://github.com/apache/spark/commit/543caf879468a3ade8350934716443207d2eaeca). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87365/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20589 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87360/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20593 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20589 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20593 **[Test build #87365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87365/testReport)** for PR 20593 at commit [`979323a`](https://github.com/apache/spark/commit/979323a4e05cfdd5473369f5063967d69c40046c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20589 **[Test build #87360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87360/testReport)** for PR 20589 at commit [`3ccad53`](https://github.com/apache/spark/commit/3ccad539410615156dea2ee83ad7d7841f520a46). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87371/testReport)** for PR 20382 at commit [`647c5cd`](https://github.com/apache/spark/commit/647c5cdd1e3cb4138b597bd429e01308f50468a6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87371/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87371/testReport)** for PR 20382 at commit [`647c5cd`](https://github.com/apache/spark/commit/647c5cdd1e3cb4138b597bd429e01308f50468a6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/838/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20382#discussion_r167776323 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -0,0 +1,246 @@ +/* --- End diff -- Sorry @tdas , I did it by simply "mv", not "git mv". This doesn't change a lot, just to be suited for data source v2 API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87370/testReport)** for PR 20382 at commit [`068c050`](https://github.com/apache/spark/commit/068c050547a3ae002ac77d0ea2d48e2b82caa049). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/837/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/20419#discussion_r167775177 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1226,14 +1226,21 @@ class CodegenContext { /** * Register a comment and return the corresponding place holder + * + * @param placeholderId a string for a place holder --- End diff -- Nit: can we rephrase this ScalaDoc a bit, maybe like: ```scala /** * ... * @param placeholderId an optionally specified identifier for the comment's placeholder. The caller should make sure this identifier is unique within the compilation unit. If this argument is not specified, a fresh identifier will be automatically created and used as the placeholder. * ... */ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20592: [SPARK-23154][ML][DOC] Document backwards compatibility ...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20592 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20590 Thank you for review, @viirya , @kiszk , @cloud-fan . Yep. I'm still trying to reproduce it by a test case. I'll inform you later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20570: [spark-23382][WEB-UI]Spark Streaming ui about the...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/20570#discussion_r167770721 --- Diff: core/src/main/resources/org/apache/spark/ui/static/webui.js --- @@ -80,4 +80,6 @@ $(function() { collapseTablePageLoad('collapse-aggregated-poolActiveStages','aggregated-poolActiveStages'); collapseTablePageLoad('collapse-aggregated-tasks','aggregated-tasks'); collapseTablePageLoad('collapse-aggregated-rdds','aggregated-rdds'); + collapseTablePageLoad('collapse-aggregated-activeBatches','aggregated-activeBatches'); --- End diff -- This function just makes sure to persist collapsed tables on page reload --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 I added a test case for ORC-285 and updated the JIRA and PR description. Now, this PR aims to fix ORC-285 by updating ORC dependencies to 1.4.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20511 **[Test build #87369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87369/testReport)** for PR 20511 at commit [`6f7fb4f`](https://github.com/apache/spark/commit/6f7fb4f95ea36638c97476f6a2b092469236e2c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/836/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation o...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20595 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20595 Merged to master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20595 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87367/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20595 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20595 **[Test build #87367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87367/testReport)** for PR 20595 at commit [`494dccd`](https://github.com/apache/spark/commit/494dccd00217355f5277a65776a2768e3bab80ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20545 **[Test build #87368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87368/testReport)** for PR 20545 at commit [`e998ace`](https://github.com/apache/spark/commit/e998ace0d6350145385b0e843284ff20bcf4e539). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20545 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/835/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20545 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20545#discussion_r167765482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -104,6 +104,13 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru /** Returns all field names in an array. */ def fieldNames: Array[String] = fields.map(_.name) + /** + * Returns all field names in an array. This is an alias of `fieldNames`. + * + * @since 2.3.0 --- End diff -- Yup, I was thinking about it too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20595 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20545#discussion_r167764844 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -134,6 +134,15 @@ class DataTypeSuite extends SparkFunSuite { assert(mapped === expected) } + test("fieldNames and names returns field names") { +val struct = StructType( + StructField("a", LongType) :: StructField("b", FloatType) :: Nil) + +assert(struct.fieldNames === Seq("a", "b")) +assert(struct.names === Seq("a", "b")) +assert(struct.fieldNames === struct.names) --- End diff -- this line is redundant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20545#discussion_r167764797 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -104,6 +104,13 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru /** Returns all field names in an array. */ def fieldNames: Array[String] = fields.map(_.name) + /** + * Returns all field names in an array. This is an alias of `fieldNames`. + * + * @since 2.3.0 --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20595 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/834/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20595 **[Test build #87367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87367/testReport)** for PR 20595 at commit [`494dccd`](https://github.com/apache/spark/commit/494dccd00217355f5277a65776a2768e3bab80ec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20595 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20595 cc @rxin @cloud-fan @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation o...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20595 [SPARK-20090][FOLLOW-UP] Revert the deprecation of `names` in PySpark ## What changes were proposed in this pull request? Deprecating the field `name` in PySpark is not expected. This PR is to revert the change. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark removeDeprecate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20595.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20595 commit 494dccd00217355f5277a65776a2768e3bab80ec Author: gatorsmile Date: 2018-02-13T05:19:03Z fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20594 cc @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20590: [SPARK-23399][SQL] Register a task completion lis...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20590#discussion_r167762763 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala --- @@ -188,6 +188,9 @@ class OrcFileFormat if (enableVectorizedReader) { val batchReader = new OrcColumnarBatchReader( enableOffHeapColumnVector && taskContext.isDefined, copyToSpark, capacity) + val iter = new RecordReaderIterator(batchReader) + Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => iter.close())) + batchReader.initialize(fileSplit, taskAttemptContext) --- End diff -- Because I tried to verify it manually in local, seems `close` is called before this change. Maybe I miss something or this is environment depending. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20545#discussion_r167762799 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -104,6 +104,13 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru /** Returns all field names in an array. */ def fieldNames: Array[String] = fields.map(_.name) + /** + * Returns all field names in an array. This is an alias of `fieldNames`. + * + * @since 2.3.0 --- End diff -- This is too late to be merged to 2.3.0. Please change it to 2.4.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20477 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20590: [SPARK-23399][SQL] Register a task completion lis...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20590#discussion_r167762591 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala --- @@ -188,6 +188,9 @@ class OrcFileFormat if (enableVectorizedReader) { val batchReader = new OrcColumnarBatchReader( enableOffHeapColumnVector && taskContext.isDefined, copyToSpark, capacity) + val iter = new RecordReaderIterator(batchReader) + Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => iter.close())) + batchReader.initialize(fileSplit, taskAttemptContext) --- End diff -- @dongjoon-hyun Thanks for this fix! My question is how do we know if `close` is not called and is called now? Have you verified it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20594 **[Test build #87366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87366/testReport)** for PR 20594 at commit [`9cd7c86`](https://github.com/apache/spark/commit/9cd7c86fad04c814b2c8f5547583122ba12c359b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/833/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20477 LGTM Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/20566 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 I'd close this and favor the quick fix #20594 based on the discussion in JIRA. Will re-open it if it is needed later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r167762013 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { } } + + private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends MLWriter { + +override protected def saveImpl(path: String): Unit = { + // SPARK-23377: The default params will be saved and loaded as user-supplied params. + // Once `inputCols` is set, the default value of `outputCol` param causes the error + // when checking exclusive params. As a temporary to fix it, we remove the default + // value of `outputCol` if `inputCols` is set before saving. + // TODO: If we modify the persistence mechanism later to better handle default params, + // we can get rid of this. + var removedOutputCol: Option[String] = None + if (instance.isSet(instance.inputCols)) { +removedOutputCol = instance.getDefault(instance.outputCol) +instance.clearDefault(instance.outputCol) + } + DefaultParamsWriter.saveMetadata(instance, path, sc) + // Add the default param back. + removedOutputCol.map(instance.setDefault(instance.outputCol, _)) --- End diff -- Although the saving logic is the same as `QuantileDiscretizerWriter`, I leave them as duplicate for now since this is a quick fix. If there is strong preference, I can make a common class for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20594 [SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug ## What changes were proposed in this pull request? Problem: Since 2.3, `Bucketizer` supports multiple input/output columns. We will check if exclusive params are set during transformation. E.g., if `inputCols` and `outputCol` are both set, an error will be thrown. However, when we write `Bucketizer`, looks like the default params and user-supplied params are merged during writing. All saved params are loaded back and set to created model instance. So the default `outputCol` param in `HasOutputCol` trait will be set in `paramMap` and become an user-supplied param. That makes the check of exclusive params failed. Fix: This changes the saving logic of Bucketizer to handle this case. This is a quick fix to catch the time of 2.3. We should consider modify the persistence mechanism later. Please see the discussion in the JIRA. Note: The multi-column `QuantileDiscretizer` also has the same issue. ## How was this patch tested? Modified tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-23377-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20594.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20594 commit 9cd7c86fad04c814b2c8f5547583122ba12c359b Author: Liang-Chi Hsieh Date: 2018-02-13T03:51:41Z Remove outputCol default value if inputCols is set. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20593 **[Test build #87365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87365/testReport)** for PR 20593 at commit [`979323a`](https://github.com/apache/spark/commit/979323a4e05cfdd5473369f5063967d69c40046c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/832/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20548 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20593 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20548#discussion_r167761502 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -298,22 +298,24 @@ object DataType { * Returns true if the two data types share the same "shape", i.e. the types (including * nullability) are the same, but the field names don't need to be the same. --- End diff -- This comments need an update too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20548#discussion_r167761409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -298,22 +298,24 @@ object DataType { * Returns true if the two data types share the same "shape", i.e. the types (including * nullability) are the same, but the field names don't need to be the same. */ - def equalsStructurally(from: DataType, to: DataType): Boolean = { + def equalsStructurally(from: DataType, to: DataType, + ignoreNullability: Boolean = false): Boolean = { --- End diff -- We can fix it when merging the PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20548#discussion_r167761351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -298,22 +298,24 @@ object DataType { * Returns true if the two data types share the same "shape", i.e. the types (including * nullability) are the same, but the field names don't need to be the same. */ - def equalsStructurally(from: DataType, to: DataType): Boolean = { + def equalsStructurally(from: DataType, to: DataType, + ignoreNullability: Boolean = false): Boolean = { --- End diff -- Nit: the indents. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20548 **[Test build #87364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87364/testReport)** for PR 20548 at commit [`367c70b`](https://github.com/apache/spark/commit/367c70bd3aa9cf82358462deb624b7634567f0c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20548 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20565: [SPARK-23379][SQL] skip when setting the same cur...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20565 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20565: [SPARK-23379][SQL] skip when setting the same current da...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20565 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20591: [SPARK-23400] [SQL] Add a constructors for ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20591 **[Test build #87363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87363/testReport)** for PR 20591 at commit [`08b39d0`](https://github.com/apache/spark/commit/08b39d093d16b8e803557eba6b525a35b0f13f75). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20591: [SPARK-23400] [SQL] Add a constructors for ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20591 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/831/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20591: [SPARK-23400] [SQL] Add a constructors for ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20591 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20546 **[Test build #87361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87361/testReport)** for PR 20546 at commit [`543caf8`](https://github.com/apache/spark/commit/543caf879468a3ade8350934716443207d2eaeca). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20424 **[Test build #87362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87362/testReport)** for PR 20424 at commit [`b63abee`](https://github.com/apache/spark/commit/b63abee881f2b4379f375500d51fdef706d6d512). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19108 **[Test build #4093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4093/testReport)** for PR 19108 at commit [`62a8fcd`](https://github.com/apache/spark/commit/62a8fcd29da6d81981f29dfc3f6e3cb77c7c6fc3). * This patch **fails PySpark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87358/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87358/testReport)** for PR 20477 at commit [`0cc0600`](https://github.com/apache/spark/commit/0cc0600b8f6f3a46189ae38850835f34b57bd945). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20589 **[Test build #87360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87360/testReport)** for PR 20589 at commit [`3ccad53`](https://github.com/apache/spark/commit/3ccad539410615156dea2ee83ad7d7841f520a46). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20589 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20490 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/20525 @cloud-fan Got it Wenchen. Thanks for your reply. I will hold off on 20579 for a while till we get this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r167754111 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -37,22 +100,129 @@ case class DataSourceV2Relation( } override def newInstance(): DataSourceV2Relation = { -copy(output = output.map(_.newInstance())) +// projection is used to maintain id assignment. +// if projection is not set, use output so the copy is not equal to the original +copy(projection = projection.map(_.newInstance())) } } /** * A specialization of DataSourceV2Relation with the streaming bit set to true. Otherwise identical * to the non-streaming relation. */ -class StreamingDataSourceV2Relation( +case class StreamingDataSourceV2Relation( output: Seq[AttributeReference], -reader: DataSourceReader) extends DataSourceV2Relation(output, reader) { +reader: DataSourceReader) +extends LeafNode with DataSourceReaderHolder with MultiInstanceRelation { override def isStreaming: Boolean = true + + override def canEqual(other: Any): Boolean = other.isInstanceOf[StreamingDataSourceV2Relation] + + override def newInstance(): LogicalPlan = copy(output = output.map(_.newInstance())) } object DataSourceV2Relation { - def apply(reader: DataSourceReader): DataSourceV2Relation = { -new DataSourceV2Relation(reader.readSchema().toAttributes, reader) + private implicit class SourceHelpers(source: DataSourceV2) { +def asReadSupport: ReadSupport = { + source match { +case support: ReadSupport => + support +case _: ReadSupportWithSchema => + // this method is only called if there is no user-supplied schema. if there is no + // user-supplied schema and ReadSupport was not implemented, throw a helpful exception. + throw new AnalysisException(s"Data source requires a user-supplied schema: $name") +case _ => + throw new AnalysisException(s"Data source is not readable: $name") + } +} + +def asReadSupportWithSchema: ReadSupportWithSchema = { + source match { +case support: ReadSupportWithSchema => + support +case _: ReadSupport => --- End diff -- There was a historical reason we do this: https://github.com/apache/spark/pull/15046 I agree it's more clear to not allow this since data source v2 is brand new. But this change worths a JIRA ticket and an individual PR, do you mind to create one? Or I can do that for you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r167753210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,80 @@ package org.apache.spark.sql.execution.datasources.v2 +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation -import org.apache.spark.sql.catalyst.expressions.AttributeReference -import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics} -import org.apache.spark.sql.sources.v2.reader._ +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Expression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics} +import org.apache.spark.sql.execution.datasources.DataSourceStrategy +import org.apache.spark.sql.sources.{DataSourceRegister, Filter} +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, ReadSupportWithSchema} +import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, SupportsPushDownCatalystFilters, SupportsPushDownFilters, SupportsPushDownRequiredColumns, SupportsReportStatistics} +import org.apache.spark.sql.types.StructType case class DataSourceV2Relation( -output: Seq[AttributeReference], -reader: DataSourceReader) - extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder { +source: DataSourceV2, +options: Map[String, String], +projection: Seq[AttributeReference], +filters: Option[Seq[Expression]] = None, +userSchema: Option[StructType] = None) extends LeafNode with MultiInstanceRelation { --- End diff -- because we call it `userSpecifiedSchema` in `DataFrameReader` and `DataSource`, I think it's more clear to make the name consistent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20591: [SPARK-23390] [SQL] Add two extra constructors for Scala...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20591 The JIRA ID is wrong... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20490 This is the writing code I was talking about: ``` // write the data and commit this writer. Utils.tryWithSafeFinallyAndFailureCallbacks(block = { iter.foreach(dataWriter.write) logInfo(s"Writer for partition ${context.partitionId()} is committing.") val msg = dataWriter.commit() logInfo(s"Writer for partition ${context.partitionId()} committed.") msg })(catchBlock = { // If there is an error, abort this writer logError(s"Writer for partition ${context.partitionId()} is aborting.") dataWriter.abort() logError(s"Writer for partition ${context.partitionId()} aborted.") }) ``` What we can probably do is to check job cancellation periodically during `iter.foreach(dataWriter.write)`, e.g. do a check for every 1k writes. Anyway let's merge this PR first. I'm only merging to master, let's backport it to 2.3 if RC3 fails(very likely to happen as there are already several regressions show up) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20548 The fix LGTM. cc @sameeragarwal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20525 Can we hold it for a while? If RC3 fails, let's merge this to 2.3 branch. If RC3 passes, we should only merge it to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20570: [spark-23382][WEB-UI]Spark Streaming ui about the...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20570#discussion_r167750238 --- Diff: core/src/main/resources/org/apache/spark/ui/static/webui.js --- @@ -80,4 +80,6 @@ $(function() { collapseTablePageLoad('collapse-aggregated-poolActiveStages','aggregated-poolActiveStages'); collapseTablePageLoad('collapse-aggregated-tasks','aggregated-tasks'); collapseTablePageLoad('collapse-aggregated-rdds','aggregated-rdds'); + collapseTablePageLoad('collapse-aggregated-activeBatches','aggregated-activeBatches'); --- End diff -- Oh I see. This doesn't also collapse by default? I wondered because of what the name "collapseTablePageLoad" seemed to suggest. Sure, the capability is fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20590 I know it's hard to add a test, we need a malformed ORC file to make the reader fail midway. @dongjoon-hyun do you think it's possible to generate such a ORC file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20590 looks reasonable. `batchReader.initBatch` throw `FileNotException`, and we enter `afterEach`, detect the file stream leak and fail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20583 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87359/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20583 **[Test build #87359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87359/testReport)** for PR 20583 at commit [`4c18e23`](https://github.com/apache/spark/commit/4c18e232725f18156b56138471c52918d3fb83b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20406: [SPARK-23230][SQL]When hive.default.fileformat is other ...
Github user cxzl25 commented on the issue: https://github.com/apache/spark/pull/20406 Thanks for your help , @dongjoon-hyun @gasparms . I submit a separate PR to 2.2 https://github.com/apache/spark/pull/20593 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org