[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20314 **[Test build #86337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86337/testReport)** for PR 20314 at commit [`b13ad38`](https://github.com/apache/spark/commit/b13ad382f3ce8a3f33b553e954b78fa9185882ba). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20314 **[Test build #86336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86336/testReport)** for PR 20314 at commit [`365aa9c`](https://github.com/apache/spark/commit/365aa9c0662dcf246d5dd54a1fb01cdc69e59dcd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20307#discussion_r162308708 --- Diff: python/pyspark/sql/functions.py --- @@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()): can fail on special rows, the workaround is to incorporate the condition into the functions. :param f: python function if used as a standalone function -:param returnType: a :class:`pyspark.sql.types.DataType` object +:param returnType: the return type of the registered user-defined function. The value can be --- End diff -- Seems typo: `the return type of the registered user-defined function.` -> `the return type of the user-defined function.`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes sc...
GitHub user foxish opened a pull request: https://github.com/apache/spark/pull/20314 [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler documentation ## What changes were proposed in this pull request? Docs changes: - Adding a warning that the backend is experimental. - Removing a defunct internal-only option from documentation - Clarifying that node selectors can be used right away, and other minor cosmetic changes ## How was this patch tested? Docs only change You can merge this pull request into a Git repository by running: $ git pull https://github.com/foxish/spark ambiguous-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20314.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20314 commit 27fc9cc2a4f000aa532240db7c871037292324c6 Author: foxishDate: 2018-01-17T22:46:15Z Basic changes commit 5369564344f2655b5453740aba6de867383c7ac3 Author: foxish Date: 2018-01-18T10:49:03Z Add section about backend commit 7b45c8d728a114704647ce714643db1e35174b7f Author: foxish Date: 2018-01-18T10:49:41Z Remove option to set executor pod prefix commit 365aa9c0662dcf246d5dd54a1fb01cdc69e59dcd Author: foxish Date: 2018-01-18T10:57:37Z clarify --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/20314 cc/ @vanzin @sameeragarwal @liyinan926 @ash211 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20312: [Docs] change to dataset for java code in structured-str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20312 **[Test build #4059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4059/testReport)** for PR 20312 at commit [`10afff2`](https://github.com/apache/spark/commit/10afff276d23cbe98f8e4187791fb1358eff15fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86328/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20305 **[Test build #86328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86328/testReport)** for PR 20305 at commit [`a978dcc`](https://github.com/apache/spark/commit/a978dcc3052f0df4485594c6cd4a944d8b6dab5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20312: [Docs] change to dataset for java code in structured-str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20312 **[Test build #4059 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4059/testReport)** for PR 20312 at commit [`10afff2`](https://github.com/apache/spark/commit/10afff276d23cbe98f8e4187791fb1358eff15fb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20313 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86332/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20313 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20313 **[Test build #86332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86332/testReport)** for PR 20313 at commit [`aeae308`](https://github.com/apache/spark/commit/aeae308055cd16d95ef9ff86df882ec1aa20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86326/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20305 **[Test build #86326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86326/testReport)** for PR 20305 at commit [`094b7eb`](https://github.com/apache/spark/commit/094b7ebbaf7bfe75e706cf42565f0c077938e821). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20309 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20309 **[Test build #86334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86334/testReport)** for PR 20309 at commit [`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20309 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86334/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162299712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- How about what I suggested at https://github.com/apache/spark/pull/20306#discussion_r162269190? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20306 **[Test build #86335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86335/testReport)** for PR 20306 at commit [`74c1735`](https://github.com/apache/spark/commit/74c17353bb6372b123c5aee1b6d58a21de36f99a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20306 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162296434 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- maybe we should do the same thing for python UDT in the future, and leave it for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162293870 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- You suggested like this? https://github.com/apache/spark/compare/master...maropu:SPARK-23054-2 If so, this just dumps an internal structure; ``` scala> val df1 = Seq((1, Vectors.dense(Array(1.0, 2.0, 3.0.toDF("a", "b") scala> df1.selectExpr("CAST(b AS STRING)").show(false) +--+ |b | +--+ |[1,,, [1.0, 2.0, 3.0]]| +--+ scala> val df2 = Seq((1, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0.toDF("a", "b") scala> df2.selectExpr("CAST(b AS STRING)").show(false) +--+ |b | +--+ |[0, 3, [0, 2], [1.0, 3.0]]| +--+ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162293629 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- Yes, it works to cast to string. Btw, as for `VectorUDT`, seems like `DenseVector` and `SparseVector` override `toString()` at least for `show()` on purpose(?): https://github.com/apache/spark/blob/74c17353bb6372b123c5aee1b6d58a21de36f99a/python/pyspark/ml/classification.py#L1497-L1503 If we also use cast to string for `show()`, the result will be like: ``` +-+--+ | features|prediction| +-+--+ |[1,,, [1.0, 0.0]]| 1.0| |[1,,, [0.0, 0.0]]| 0.0| +-+--+ ``` I'm not sure we can change the string here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20275#discussion_r162292944 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -113,6 +113,13 @@ class VectorsSuite extends SparkFunSuite with Logging { assert(vec.toArray === arr) } + test("zero-length sparse vector") { --- End diff -- While we're doing this we may as well also add a test to `intercept` the exception for negative size (as per the other sparse vector construction tests), for both `ml` and `mllib` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20275#discussion_r162292520 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- @@ -113,6 +113,13 @@ class VectorsSuite extends SparkFunSuite with Logging { assert(vec.toArray === arr) } + test("zero-length sparse vector") { --- End diff -- We may as well add the same test to `ml.linalg` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20309 **[Test build #86334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86334/testReport)** for PR 20309 at commit [`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20309 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20310: revert [SPARK-10030] Use tags to control which tests to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20310 **[Test build #86333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86333/testReport)** for PR 20310 at commit [`b6c46b5`](https://github.com/apache/spark/commit/b6c46b5900bf1109f836674b8ba5ee3cb4712771). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20313 **[Test build #86332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86332/testReport)** for PR 20313 at commit [`aeae308`](https://github.com/apache/spark/commit/aeae308055cd16d95ef9ff86df882ec1aa20). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162287902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- Like the python UDT, we cursively call `castToStringCode(pudt.sqlType, ctx)`, does it work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20313 [SPARK-22974][ML] Attach attributes to output column of CountVectorModel ## What changes were proposed in this pull request? The output column from `CountVectorModel` lacks attribute. So a later transformer like `Interaction` can raise error because no attribute available. ## How was this patch tested? Added test. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22974 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20313.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20313 commit aeae308055cd16d95ef9ff86df882ec1aa20 Author: Liang-Chi HsiehDate: 2018-01-18T09:25:54Z Attach attributes to output column of CountVectorModel. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20023 LGTM, pending jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20311: [SPARK-23144][SS] Added console sink for continuous proc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20311 **[Test build #86331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86331/testReport)** for PR 20311 at commit [`6f69669`](https://github.com/apache/spark/commit/6f69669c6b34a6d6bbcd11c3fb635262fe802d28). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20312: [Docs] change to dataset for java code in structured-str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20312 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20311: [SPARK-23144][SS] Added console sink for continuous proc...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20311 @jose-torres PTAL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20312: [Docs] change to dataset for java code in structu...
GitHub user brandonJY opened a pull request: https://github.com/apache/spark/pull/20312 [Docs] change to dataset for java code in structured-streaming-kafka-integration document ## What changes were proposed in this pull request? In latest structured-streaming-kafka-integration document, Java code example for Kafka integration is using `DataFrame`, shouldn't it be changed to `DataSet`? ## How was this patch tested? manual test has been performed to test the updated example Java code in Spark 2.2.1 with Kafka 1.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/brandonJY/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20312.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20312 commit 10afff276d23cbe98f8e4187791fb1358eff15fb Author: brandonJYDate: 2018-01-18T08:57:56Z [Docs] change to dataset for java code in structured-streaming-kafka-integration document In latest structured-streaming-kafka-integration document, Java code example for Kafka integration is using `DataFrame`, shouldn't it be changed to `DataSet`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20311: [SPARK-23144][SS] Added console sink for continuo...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20311 [SPARK-23144][SS] Added console sink for continuous processing ## What changes were proposed in this pull request? Refactored ConsoleWriter into ConsoleMicrobatchWriter and ConsoleContinuousWriter. ## How was this patch tested? new unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-23144 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20311.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20311 commit 6f69669c6b34a6d6bbcd11c3fb635262fe802d28 Author: Tathagata DasDate: 2018-01-18T09:07:00Z added console sink for continuous processing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19492: [SPARK-22228][SQL] Add support for array...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19492 @viirya did you have any chance to look at this? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20309 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86329/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20309 **[Test build #86329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86329/testReport)** for PR 20309 at commit [`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20309 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20023 **[Test build #86330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86330/testReport)** for PR 20023 at commit [`b4b0350`](https://github.com/apache/spark/commit/b4b0350dea09db897b70485ef1fad41a742eae30). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162277049 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- But, `VectorUDT.sqlType` has non-array formats: https://github.com/apache/spark/blob/1c76a91e5fae11dcb66c453889e587b48039fdc9/mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala#L88 In this case, how do we convert `VectorUDT` data into array-lie strings (e.g., [0, 1, 2, ...])? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20309 **[Test build #86329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86329/testReport)** for PR 20309 at commit [`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20305 **[Test build #86328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86328/testReport)** for PR 20305 at commit [`a978dcc`](https://github.com/apache/spark/commit/a978dcc3052f0df4485594c6cd4a944d8b6dab5e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20306#discussion_r162275882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String |$evPrim = $buffer.build(); """.stripMargin } + case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, ctx) --- End diff -- New thought: since UDT is not finalized yet(internal only), the only thing we care about is to have a reasonable string representation. It's unclear that UDT class always have a reasonable `toString`, and `UDT.deserialize` may be pretty slow, how about we always use `UDT.sqlType` to string casting and `showString`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20276 **[Test build #86327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86327/testReport)** for PR 20276 at commit [`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20276 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hi...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20305#discussion_r162274760 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala --- @@ -98,20 +98,7 @@ class HiveSessionStateBuilder(session: SparkSession, parentState: Option[Session override def extraPlanningStrategies: Seq[Strategy] = super.extraPlanningStrategies ++ customPlanningStrategies - override def strategies: Seq[Strategy] = { -experimentalMethods.extraStrategies ++ - extraPlanningStrategies ++ Seq( - FileSourceStrategy, - DataSourceStrategy(conf), - SpecialLimits, - InMemoryScans, - HiveTableScans, - Scripts, - Aggregation, - JoinSelection, - BasicOperators -) - } + override def strategies: Seq[Strategy] = Seq(HiveTableScans, Scripts) ++ super.strategies --- End diff -- OK, let me update it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20305#discussion_r162274134 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala --- @@ -98,20 +98,7 @@ class HiveSessionStateBuilder(session: SparkSession, parentState: Option[Session override def extraPlanningStrategies: Seq[Strategy] = super.extraPlanningStrategies ++ customPlanningStrategies - override def strategies: Seq[Strategy] = { -experimentalMethods.extraStrategies ++ - extraPlanningStrategies ++ Seq( - FileSourceStrategy, - DataSourceStrategy(conf), - SpecialLimits, - InMemoryScans, - HiveTableScans, - Scripts, - Aggregation, - JoinSelection, - BasicOperators -) - } + override def strategies: Seq[Strategy] = Seq(HiveTableScans, Scripts) ++ super.strategies --- End diff -- This breaks the assumption that `experimentalMethods.extraStrategies` should always run first. I think we can just do: ``` override def extraPlanningStrategies: Seq[Strategy] = super.extraPlanningStrategies ++ customPlanningStrategies ++ Seq(HiveTableScans, Scripts) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20305 **[Test build #86326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86326/testReport)** for PR 20305 at commit [`094b7eb`](https://github.com/apache/spark/commit/094b7ebbaf7bfe75e706cf42565f0c077938e821). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20306 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86314/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20306 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20305 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86318/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20298 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86315/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86324/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20306 **[Test build #86314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86314/testReport)** for PR 20306 at commit [`74c1735`](https://github.com/apache/spark/commit/74c17353bb6372b123c5aee1b6d58a21de36f99a). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20276 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20305 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20298 **[Test build #86315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86315/testReport)** for PR 20298 at commit [`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20298 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20305 **[Test build #86318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86318/testReport)** for PR 20305 at commit [`f17b44d`](https://github.com/apache/spark/commit/f17b44de6e4d2ece008d3856fdcc037cce7dd147). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20276 **[Test build #86324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86324/testReport)** for PR 20276 at commit [`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hi...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20305#discussion_r162270415 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala --- @@ -101,6 +102,7 @@ class HiveSessionStateBuilder(session: SparkSession, parentState: Option[Session override def strategies: Seq[Strategy] = { experimentalMethods.extraStrategies ++ extraPlanningStrategies ++ Seq( --- End diff -- Looks like the ordering matters, If I put Hive related strategies in the end, some unit tests will be failed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org