[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18174 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79123/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18174 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18174 **[Test build #79123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79123/testReport)** for PR 18174 at commit [`3efc743`](https://github.com/apache/spark/commit/3efc7433802155c957e78d23abf4847cde8e0d07). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18520 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18520 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79124/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18523 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18520 **[Test build #79124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79124/testReport)** for PR 18520 at commit [`0b9f860`](https://github.com/apache/spark/commit/0b9f860cee44bb06feeb291b566243e139cbaf28). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...
GitHub user facaiy opened a pull request: https://github.com/apache/spark/pull/18523 [SPARK-21285][ML] VectorAssembler reports the column name of unsupported data type ## What changes were proposed in this pull request? add the column name in the exception which is raised by unsupported data type. ## How was this patch tested? [ ] pass all tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/facaiy/spark ENH/vectorassembler_add_col Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18523 commit 95dbf6c7b287d0010af9de377ff6b93dec760808 Author: Yan Facai (é¢åæ)Date: 2017-07-04T05:42:07Z ENH: report the name of missing column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18519: [SPARK-16742] kerberos
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18519 Not a big deal but could we fix the PR title to be a bit more descriptive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18511 Not a bid deal but I would like to suggest to fix the title to be more descriptive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79130/testReport)** for PR 17848 at commit [`0aa6475`](https://github.com/apache/spark/commit/0aa64755009701c1d37de27c48926b4f46373fa8). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18511 Not a big deal but I would like to suggest to fix the title to be more descriptive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79130/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79130/testReport)** for PR 17848 at commit [`0aa6475`](https://github.com/apache/spark/commit/0aa64755009701c1d37de27c48926b4f46373fa8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18522: [MINOR]Closes stream and releases any system reso...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18522#discussion_r125392441 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -488,7 +488,7 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { test("resolveURIs with multiple paths") { def assertResolves(before: String, after: String): Unit = { - assume(before.split(",").length > 1) + assume(before.split(",").length >= 1) --- End diff -- BTW, why do we fix this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18521 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18521 **[Test build #79128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79128/testReport)** for PR 18521 at commit [`5b80a8b`](https://github.com/apache/spark/commit/5b80a8b92273e9abf6ce8b28dcd70fbb32d4613c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79128/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18511 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79122/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18511 **[Test build #79122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79122/testReport)** for PR 18511 at commit [`1d098ab`](https://github.com/apache/spark/commit/1d098abb7c087fa26c3cae1eb8c8dd8ffbe8530b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17985: Add "full_outer" name to join types
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17985 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79126/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17985: Add "full_outer" name to join types
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17985 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17985: Add "full_outer" name to join types
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17985 **[Test build #79126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79126/testReport)** for PR 17985 at commit [`9fc9a0a`](https://github.com/apache/spark/commit/9fc9a0ad567dfb28d22d94321fcef0ea3b1ae73b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18469 **[Test build #79129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79129/testReport)** for PR 18469 at commit [`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18469 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r125390211 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -47,10 +56,73 @@ trait RunnableCommand extends logical.Command { } /** + * A special `RunnableCommand` which writes data out and updates metrics. + */ +trait DataWritingCommand extends RunnableCommand { + + override lazy val metrics: Map[String, SQLMetric] = { +val sparkContext = SparkContext.getActive.get +Map( + "avgTime" -> SQLMetrics.createMetric(sparkContext, "average writing time (ms)"), + "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of written files"), + "numOutputBytes" -> SQLMetrics.createMetric(sparkContext, "bytes of written output"), + "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"), + "numParts" -> SQLMetrics.createMetric(sparkContext, "number of dynamic part") +) + } + + /** + * Callback function that update metrics collected from the writing operation. + */ + protected def updateWritingMetrics(writeSummaries: Seq[ExecutedWriteSummary]): Unit = { +val sparkContext = SparkContext.getActive.get +var numPartitions = 0 +var numFiles = 0 +var totalNumBytes: Long = 0L +var totalNumOutput: Long = 0L +var totalWritingTime: Long = 0L +var numFilesNonZeroWritingTime = 0 + +writeSummaries.foreach { summary => + numPartitions += summary.updatedPartitions.size + numFiles += summary.numOutputFile + totalNumBytes += summary.numOutputBytes + totalNumOutput += summary.numOutputRows + totalWritingTime += summary.totalWritingTime + numFilesNonZeroWritingTime += summary.numFilesWithNonZeroWritingTime +} + +// We only count non-zero writing time when averaging total writing time. +// The time for writing individual file can be zero if it's less than 1 ms. Zero values can --- End diff -- I guess this should be rare? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18159 LGTM except some minor comments, thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18469 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79125/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r125389877 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -47,10 +56,73 @@ trait RunnableCommand extends logical.Command { } /** + * A special `RunnableCommand` which writes data out and updates metrics. + */ +trait DataWritingCommand extends RunnableCommand { + + override lazy val metrics: Map[String, SQLMetric] = { +val sparkContext = SparkContext.getActive.get +Map( + "avgTime" -> SQLMetrics.createMetric(sparkContext, "average writing time (ms)"), + "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of written files"), + "numOutputBytes" -> SQLMetrics.createMetric(sparkContext, "bytes of written output"), + "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"), + "numParts" -> SQLMetrics.createMetric(sparkContext, "number of dynamic part") +) + } + + /** + * Callback function that update metrics collected from the writing operation. + */ + protected def updateWritingMetrics(writeSummaries: Seq[ExecutedWriteSummary]): Unit = { +val sparkContext = SparkContext.getActive.get +var numPartitions = 0 +var numFiles = 0 +var totalNumBytes: Long = 0L +var totalNumOutput: Long = 0L +var totalWritingTime: Long = 0L +var numFilesNonZeroWritingTime = 0 + +writeSummaries.foreach { summary => + numPartitions += summary.updatedPartitions.size + numFiles += summary.numOutputFile + totalNumBytes += summary.numOutputBytes + totalNumOutput += summary.numOutputRows + totalWritingTime += summary.totalWritingTime + numFilesNonZeroWritingTime += summary.numFilesWithNonZeroWritingTime +} + +// We only count non-zero writing time when averaging total writing time. +// The time for writing individual file can be zero if it's less than 1 ms. Zero values can --- End diff -- This only happens if a partition is very small, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18469 **[Test build #79125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79125/testReport)** for PR 18469 at commit [`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r125389753 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -314,21 +339,40 @@ object FileFormatWriter extends Logging { recordsInFile = 0 releaseResources() + numOutputRows += recordsInFile newOutputWriter(fileCounter) } val internalRow = iter.next() +val startTime = System.nanoTime() currentWriter.write(internalRow) +timeOnCurrentFile += (System.nanoTime() - startTime) --- End diff -- instead of tracking the time here, how about we do it in `newOutputWriter`? ``` var startTime = -1 def newOutputWriter { if (startTime == -1) { startTime = System.nanoTime() } else { val currentTime = System.nanoTime() totalWritingTime += currentTime - startTime startTime = currentTime } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18501: [SPARK-20256][SQL] SessionState should be created more l...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18501 Hi, @cloud-fan and @gatorsmile . I'm back to this PR. Although this introduces a new concept, this could be a solution in the current relation between `SparkContext` and `SparkSession`. How do you think about this approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18521 **[Test build #79128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79128/testReport)** for PR 18521 at commit [`5b80a8b`](https://github.com/apache/spark/commit/5b80a8b92273e9abf6ce8b28dcd70fbb32d4613c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r125389285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -47,10 +56,73 @@ trait RunnableCommand extends logical.Command { } /** + * A special `RunnableCommand` which writes data out and updates metrics. + */ +trait DataWritingCommand extends RunnableCommand { --- End diff -- let's move it to a new file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18501: [SPARK-20256][SQL] SessionState should be created more l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18501 **[Test build #79127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79127/testReport)** for PR 18501 at commit [`137f252`](https://github.com/apache/spark/commit/137f252c79f3f044507a453320a66ac6d0cb6334). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic and distinc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r125388937 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -85,8 +94,9 @@ case class UserDefinedFunction protected[sql] ( * @since 2.3.0 */ def withName(name: String): this.type = { -this._nameOption = Option(name) -this +val udf = copyAll() +udf._nameOption = Option(name) --- End diff -- yea, I know. I just meant we added an interface `newInstance(name, nullable, determinism)` there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17865 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17865 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79118/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r125388507 --- Diff: core/src/main/java/org/apache/spark/memory/MemoryMode.java --- @@ -22,5 +22,6 @@ @Private public enum MemoryMode { ON_HEAP, - OFF_HEAP + OFF_HEAP, + ON_HEAP_CACHEDBATCH --- End diff -- Current implementation relies on memory mode to allocate a kind of `ColumnVector`. If we do not add a new memory model, I think that we have to introduce additional conditional branches in getter/setter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17865 **[Test build #79118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79118/testReport)** for PR 17865 at commit [`f17f332`](https://github.com/apache/spark/commit/f17f332dd97b948f8dd31eb2b18c1e11dc7fead0). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r125388308 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapCachedBatch.java --- @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.vectorized; + +import java.nio.ByteBuffer; + +import org.apache.spark.memory.MemoryMode; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder; +import org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter; +import org.apache.spark.sql.execution.columnar.*; +import org.apache.spark.sql.types.*; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * A column backed by an in memory JVM array. + */ +public final class OnHeapCachedBatch extends ColumnVector implements java.io.Serializable { + + // keep compressed data + private byte[] buffer; + + // whether a row is already extracted or not. If extractTo() is called, set true + // e.g. when isNullAt() and getInt() ara called, extractTo() must be called only once + private boolean[] calledExtractTo; + + // a row where the compressed data is extracted + private transient UnsafeRow unsafeRow; + private transient BufferHolder bufferHolder; + private transient UnsafeRowWriter rowWriter; + private transient MutableUnsafeRow mutableRow; + + // accesssor for a column + private transient ColumnAccessor columnAccessor; + + // an accessor uses only column 0 + private final int ORDINAL = 0; + + protected OnHeapCachedBatch(int capacity, DataType type) { +super(capacity, type, MemoryMode.ON_HEAP_CACHEDBATCH); +reserveInternal(capacity); +reset(); + } + + @Override + public long valuesNativeAddress() { +throw new RuntimeException("Cannot get native address for on heap column"); + } + @Override + public long nullsNativeAddress() { +throw new RuntimeException("Cannot get native address for on heap column"); + } + + @Override + public void close() { + } + + private void initialize() { +if (columnAccessor == null) { + setColumnAccessor(); +} +if (mutableRow == null) { + setRowSetter(); +} + } + + private void setColumnAccessor() { +ByteBuffer byteBuffer = ByteBuffer.wrap(buffer); +columnAccessor = ColumnAccessor$.MODULE$.apply(type, byteBuffer); +calledExtractTo = new boolean[capacity]; + } + + private void setRowSetter() { +unsafeRow = new UnsafeRow(1); +bufferHolder = new BufferHolder(unsafeRow); +rowWriter = new UnsafeRowWriter(bufferHolder, 1); +mutableRow = new MutableUnsafeRow(rowWriter); + } + + // call extractTo() before getting actual data + private void prepareRowAccess(int rowId) { --- End diff -- I agree with you. We can optimize these access by enhancing existing APIs. Should we address these extensions in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18502 I'm going to close this PR for a while. Thank you and sorry for this PR, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10....
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/18502 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic and distinc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r125387952 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -85,8 +94,9 @@ case class UserDefinedFunction protected[sql] ( * @since 2.3.0 */ def withName(name: String): this.type = { -this._nameOption = Option(name) -this +val udf = copyAll() +udf._nameOption = Option(name) --- End diff -- @maropu We should make a copy when calling `withName`, instead of returning this object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18522: [MINOR]Closes stream and releases any system resources a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79120/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18522: [MINOR]Closes stream and releases any system resources a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18522 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18522: [MINOR]Closes stream and releases any system resources a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18522 **[Test build #79120 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79120/testReport)** for PR 18522 at commit [`c0cf41d`](https://github.com/apache/spark/commit/c0cf41d2d7aeb0d02ed3593464072f3b083f3f6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18502 Actually, the Spark failure is due to flaky test. However, for PySpark failures, we are hitting https://github.com/bartdag/py4j/issues/278 . We need to wait for 0.10.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125387151 --- Diff: python/pyspark/sql/types.py --- @@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj, dataType, nullable=True): +def _make_type_verifier(dataType, nullable=True, name=None): """ Verify the type of obj against dataType, raise a TypeError if they do not match. Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it will become infinity when cast to Java float if it overflows. ->>> _verify_type(None, StructType([])) ->>> _verify_type("", StringType()) ->>> _verify_type(0, LongType()) ->>> _verify_type(list(range(3)), ArrayType(ShortType())) ->>> _verify_type(set(), ArrayType(StringType())) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(StructType([]))(None) +>>> _make_type_verifier(StringType())("") +>>> _make_type_verifier(LongType())(0) +>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3))) +>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError:... ->>> _verify_type({}, MapType(StringType(), IntegerType())) ->>> _verify_type((), StructType([])) ->>> _verify_type([], StructType([])) ->>> _verify_type([1], StructType([])) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({}) +>>> _make_type_verifier(StructType([]))(()) +>>> _make_type_verifier(StructType([]))([]) +>>> _make_type_verifier(StructType([]))([1]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... >>> # Check if numeric values are within the allowed range. ->>> _verify_type(12, ByteType()) ->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType())(12) +>>> _make_type_verifier(ByteType())(1234) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type(None, ByteType(), False) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType(), False)(None) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier( +... ArrayType(ShortType(), False))([1, None]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type({None: 1}, MapType(StringType(), IntegerType())) +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 1}) Traceback (most recent call last): ... ValueError:... >>> schema = StructType().add("a", IntegerType()).add("b", StringType(), False) ->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(schema)((1, None)) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... """ -if obj is None: -if nullable: -return + +if name is None: +new_msg = lambda msg: msg +new_name = lambda n: "field %s" % n +else: +new_msg = lambda msg: "%s: %s" % (name, msg) +new_name = lambda n: "field %s in %s" % (n, name) + +def verify_nullability(obj): +if obj is None: +if nullable: +return True +else: +raise ValueError(new_msg("This field is not nullable, but got None")) else: -raise ValueError("This field is not nullable, but got None") +return False --- End diff -- sounds good. Will give a shot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125386646 --- Diff: python/pyspark/sql/types.py --- @@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj, dataType, nullable=True): +def _make_type_verifier(dataType, nullable=True, name=None): """ Verify the type of obj against dataType, raise a TypeError if they do not match. Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it will become infinity when cast to Java float if it overflows. ->>> _verify_type(None, StructType([])) ->>> _verify_type("", StringType()) ->>> _verify_type(0, LongType()) ->>> _verify_type(list(range(3)), ArrayType(ShortType())) ->>> _verify_type(set(), ArrayType(StringType())) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(StructType([]))(None) +>>> _make_type_verifier(StringType())("") +>>> _make_type_verifier(LongType())(0) +>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3))) +>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError:... ->>> _verify_type({}, MapType(StringType(), IntegerType())) ->>> _verify_type((), StructType([])) ->>> _verify_type([], StructType([])) ->>> _verify_type([1], StructType([])) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({}) +>>> _make_type_verifier(StructType([]))(()) +>>> _make_type_verifier(StructType([]))([]) +>>> _make_type_verifier(StructType([]))([1]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... >>> # Check if numeric values are within the allowed range. ->>> _verify_type(12, ByteType()) ->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType())(12) +>>> _make_type_verifier(ByteType())(1234) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type(None, ByteType(), False) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType(), False)(None) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier( +... ArrayType(ShortType(), False))([1, None]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type({None: 1}, MapType(StringType(), IntegerType())) +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 1}) Traceback (most recent call last): ... ValueError:... >>> schema = StructType().add("a", IntegerType()).add("b", StringType(), False) ->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(schema)((1, None)) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... """ -if obj is None: -if nullable: -return + +if name is None: +new_msg = lambda msg: msg +new_name = lambda n: "field %s" % n +else: +new_msg = lambda msg: "%s: %s" % (name, msg) +new_name = lambda n: "field %s in %s" % (n, name) + +def verify_nullability(obj): +if obj is None: +if nullable: +return True +else: +raise ValueError(new_msg("This field is not nullable, but got None")) else: -raise ValueError("This field is not nullable, but got None") +return False --- End diff -- how about: ``` def verify_nullability(obj): ... if isinstance(dataType, StringType): def verify_string(obj): ... verify_value = verify_string elif ... def verify(obj): if (verify_nullability(obj)): return None verify_value(obj) return verify ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125386547 --- Diff: python/pyspark/sql/tests.py --- @@ -30,6 +30,19 @@ import functools import time import datetime +import traceback + +if sys.version_info[:2] <= (2, 6): --- End diff -- Ah, hmm.. it should specifically check Python 2.6 <= as unittest2 is unittest from Python 2.7. To check minor versions, I think we should "extract" or compare with raw string `'2.6'`. I will clean up inconsistency in another PR about this later if you are okay as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...
Github user manku-timma commented on a diff in the pull request: https://github.com/apache/spark/pull/18174#discussion_r125386319 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -360,12 +368,10 @@ void forceSorterToSpill() throws IOException { final OutputStream bos = new BufferedOutputStream( new FileOutputStream(outputFile), -(int) sparkConf.getSizeAsKb("spark.shuffle.unsafe.file.output.buffer", "32k") * 1024); +outputBufferSizeInBytes); --- End diff -- Just to understand what is happening. 1. Shuffle records are written to a serialisation buffer (1M) after serialisation 2. The serialised buffer is written to in-memory-sorterâs buffer 3. once in-memory sorterâs buffer is full, the data is copied to sorterâs disk buffer (1M) 4. the sorterâs disk buffer is written out to a buffered output stream (buffer = 32k) I am guessing reducing the sorterâs disk buffer (in step 3) is helping because it triggers fewer writes/allocations in a single call at step 4 (and allowing more parallelism in writing back to disk and copying of data). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125386326 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -168,7 +173,9 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) override def checkInputDataTypes(): TypeCheckResult = { -if (children.size % 2 != 0) { +if (children == Nil) { + TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty") --- End diff -- oh, my bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125386253 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { rand(Random.nextLong()), randn(Random.nextLong()) ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + + test("SPARK-21281 fails if functions have no argument") { --- End diff -- ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18502 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18502 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79121/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18502 **[Test build #79121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79121/testReport)** for PR 18502 at commit [`5ba7b11`](https://github.com/apache/spark/commit/5ba7b112ae110acc5e9908c47d1df67b7be3a58b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125386150 --- Diff: python/pyspark/sql/types.py --- @@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj, dataType, nullable=True): +def _make_type_verifier(dataType, nullable=True, name=None): """ Verify the type of obj against dataType, raise a TypeError if they do not match. Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it will become infinity when cast to Java float if it overflows. ->>> _verify_type(None, StructType([])) ->>> _verify_type("", StringType()) ->>> _verify_type(0, LongType()) ->>> _verify_type(list(range(3)), ArrayType(ShortType())) ->>> _verify_type(set(), ArrayType(StringType())) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(StructType([]))(None) +>>> _make_type_verifier(StringType())("") +>>> _make_type_verifier(LongType())(0) +>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3))) +>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError:... ->>> _verify_type({}, MapType(StringType(), IntegerType())) ->>> _verify_type((), StructType([])) ->>> _verify_type([], StructType([])) ->>> _verify_type([1], StructType([])) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({}) +>>> _make_type_verifier(StructType([]))(()) +>>> _make_type_verifier(StructType([]))([]) +>>> _make_type_verifier(StructType([]))([1]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... >>> # Check if numeric values are within the allowed range. ->>> _verify_type(12, ByteType()) ->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType())(12) +>>> _make_type_verifier(ByteType())(1234) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type(None, ByteType(), False) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType(), False)(None) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier( +... ArrayType(ShortType(), False))([1, None]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type({None: 1}, MapType(StringType(), IntegerType())) +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 1}) Traceback (most recent call last): ... ValueError:... >>> schema = StructType().add("a", IntegerType()).add("b", StringType(), False) ->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(schema)((1, None)) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... """ -if obj is None: -if nullable: -return + +if name is None: +new_msg = lambda msg: msg +new_name = lambda n: "field %s" % n +else: +new_msg = lambda msg: "%s: %s" % (name, msg) +new_name = lambda n: "field %s in %s" % (n, name) + +def verify_nullability(obj): +if obj is None: +if nullable: +return True +else: +raise ValueError(new_msg("This field is not nullable, but got None")) else: -raise ValueError("This field is not nullable, but got None") +return False # StringType can work with any types if isinstance(dataType, StringType): -return +def verify_string(obj): +if verify_nullability(obj): +return None --- End diff -- ah makes sense, but at least here we are not returning earlier, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark issue #18445: [Spark-19726][SQL] Faild to insert null timestamp value ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18445 yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125385778 --- Diff: python/pyspark/sql/tests.py --- @@ -30,6 +30,19 @@ import functools import time import datetime +import traceback + +if sys.version_info[:2] <= (2, 6): --- End diff -- I will keep in mind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125385638 --- Diff: python/pyspark/sql/tests.py --- @@ -30,6 +30,19 @@ import functools import time import datetime +import traceback + +if sys.version_info[:2] <= (2, 6): --- End diff -- can we follow the existing style here? You can send a PR to update all of them later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125385500 --- Diff: python/pyspark/sql/types.py --- @@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj, dataType, nullable=True): +def _make_type_verifier(dataType, nullable=True, name=None): """ Verify the type of obj against dataType, raise a TypeError if they do not match. Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it will become infinity when cast to Java float if it overflows. ->>> _verify_type(None, StructType([])) ->>> _verify_type("", StringType()) ->>> _verify_type(0, LongType()) ->>> _verify_type(list(range(3)), ArrayType(ShortType())) ->>> _verify_type(set(), ArrayType(StringType())) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(StructType([]))(None) +>>> _make_type_verifier(StringType())("") +>>> _make_type_verifier(LongType())(0) +>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3))) +>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError:... ->>> _verify_type({}, MapType(StringType(), IntegerType())) ->>> _verify_type((), StructType([])) ->>> _verify_type([], StructType([])) ->>> _verify_type([1], StructType([])) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({}) +>>> _make_type_verifier(StructType([]))(()) +>>> _make_type_verifier(StructType([]))([]) +>>> _make_type_verifier(StructType([]))([1]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... >>> # Check if numeric values are within the allowed range. ->>> _verify_type(12, ByteType()) ->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType())(12) +>>> _make_type_verifier(ByteType())(1234) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type(None, ByteType(), False) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType(), False)(None) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier( +... ArrayType(ShortType(), False))([1, None]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type({None: 1}, MapType(StringType(), IntegerType())) +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 1}) Traceback (most recent call last): ... ValueError:... >>> schema = StructType().add("a", IntegerType()).add("b", StringType(), False) ->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(schema)((1, None)) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... """ -if obj is None: -if nullable: -return + +if name is None: --- End diff -- This looks a bit odds but I could not figure out a shorter and cleaner way ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17832: [SPARK-20557][SQL] Support for db column type TIM...
Github user atrigent commented on a diff in the pull request: https://github.com/apache/spark/pull/17832#discussion_r125385324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -223,6 +223,9 @@ object JdbcUtils extends Logging { case java.sql.Types.STRUCT=> StringType case java.sql.Types.TIME => TimestampType case java.sql.Types.TIMESTAMP => TimestampType + case java.sql.Types.TIMESTAMP_WITH_TIMEZONE +=> TimestampType + case -101 => TimestampType --- End diff -- Why was this `-101` thing put here instead of in the Oracle dialect? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 ping @sethah @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125384875 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { rand(Random.nextLong()), randn(Random.nextLong()) ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + + test("SPARK-21281 fails if functions have no argument") { --- End diff -- Could you create a helper function for removing these duplicate codes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125384834 --- Diff: python/pyspark/sql/session.py --- @@ -514,17 +514,21 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr schema = [str(x) for x in data.columns] data = [r.tolist() for r in data.to_records(index=False)] -verify_func = _verify_type if verifySchema else lambda _, t: True if isinstance(schema, StructType): +verify_func = _make_type_verifier(schema) if verifySchema else lambda _: True + def prepare(obj): -verify_func(obj, schema) +verify_func(obj) return obj elif isinstance(schema, DataType): dataType = schema schema = StructType().add("value", schema) +verify_func = _make_type_verifier( +dataType, name="field value") if verifySchema else lambda _: True --- End diff -- Oh, wait you mean `field value`. Yes, this is printed as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125384785 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -41,8 +41,13 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children == Nil) { + TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty") --- End diff -- `coalesce `? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125384801 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -168,7 +173,9 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) override def checkInputDataTypes(): TypeCheckResult = { -if (children.size % 2 != 0) { +if (children == Nil) { + TypeCheckResult.TypeCheckFailure("input to function coalesce cannot be empty") --- End diff -- `coalesce ` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125384578 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { rand(Random.nextLong()), randn(Random.nextLong()) ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + + test("SPARK-21281 fails if functions have no argument") { +var errMsg = intercept[AnalysisException] { + spark.range(1).select(array()) +}.getMessage +assert(errMsg.contains("due to data type mismatch: input to function coalesce cannot be empty")) + +errMsg = intercept[AnalysisException] { + spark.range(1).select(map()) +}.getMessage +assert(errMsg.contains("due to data type mismatch: input to function coalesce cannot be empty")) + +// spark.range(1).select(coalesce()) +errMsg = intercept[AnalysisException] { + spark.range(1).select(coalesce()) +}.getMessage +assert(errMsg.contains("due to data type mismatch: input to function coalesce cannot be empty")) + +// This hits java.lang.AssertionError +// spark.range(1).select(struct()) + +errMsg = intercept[IllegalArgumentException] { + spark.range(1).select(greatest()) +}.getMessage +assert(errMsg.contains("requirement failed: greatest requires at least 2 arguments")) --- End diff -- uh. I see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17985: Add "full_outer" name to join types
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17985 **[Test build #79126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79126/testReport)** for PR 17985 at commit [`9fc9a0a`](https://github.com/apache/spark/commit/9fc9a0ad567dfb28d22d94321fcef0ea3b1ae73b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18516#discussion_r125384379 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { rand(Random.nextLong()), randn(Random.nextLong()) ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + + test("SPARK-21281 fails if functions have no argument") { --- End diff -- Could you move these functions to `.sql`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/18228 `Substring` is the most commonly used as `left` and `right`, and i think using these form is more friendly for users. Also mysql and SQL server support these two functions and `Substring` @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17985: Add "full_outer" name to join types
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17985 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125384150 --- Diff: python/pyspark/sql/types.py --- @@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj, dataType, nullable=True): +def _make_type_verifier(dataType, nullable=True, name=None): """ Verify the type of obj against dataType, raise a TypeError if they do not match. Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it will become infinity when cast to Java float if it overflows. ->>> _verify_type(None, StructType([])) ->>> _verify_type("", StringType()) ->>> _verify_type(0, LongType()) ->>> _verify_type(list(range(3)), ArrayType(ShortType())) ->>> _verify_type(set(), ArrayType(StringType())) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(StructType([]))(None) +>>> _make_type_verifier(StringType())("") +>>> _make_type_verifier(LongType())(0) +>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3))) +>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError:... ->>> _verify_type({}, MapType(StringType(), IntegerType())) ->>> _verify_type((), StructType([])) ->>> _verify_type([], StructType([])) ->>> _verify_type([1], StructType([])) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({}) +>>> _make_type_verifier(StructType([]))(()) +>>> _make_type_verifier(StructType([]))([]) +>>> _make_type_verifier(StructType([]))([1]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... >>> # Check if numeric values are within the allowed range. ->>> _verify_type(12, ByteType()) ->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType())(12) +>>> _make_type_verifier(ByteType())(1234) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type(None, ByteType(), False) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType(), False)(None) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier( +... ArrayType(ShortType(), False))([1, None]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type({None: 1}, MapType(StringType(), IntegerType())) +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 1}) Traceback (most recent call last): ... ValueError:... >>> schema = StructType().add("a", IntegerType()).add("b", StringType(), False) ->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(schema)((1, None)) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... """ -if obj is None: -if nullable: -return + +if name is None: +new_msg = lambda msg: msg +new_name = lambda n: "field %s" % n +else: +new_msg = lambda msg: "%s: %s" % (name, msg) +new_name = lambda n: "field %s in %s" % (n, name) + +def verify_nullability(obj): +if obj is None: +if nullable: +return True +else: +raise ValueError(new_msg("This field is not nullable, but got None")) else: -raise ValueError("This field is not nullable, but got None") +return False # StringType can work with any types if isinstance(dataType, StringType): -return +def verify_string(obj): +if verify_nullability(obj): +return None --- End diff -- ```python def A(): print "a" def B(): print "a" return None def C(): print "a" return print(A() is None) print(A() == B() == C()) ``` These are synonyms. I believe this is also about preference - `return` vs `return None` if we should
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18469 **[Test build #79125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79125/testReport)** for PR 18469 at commit [`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125383919 --- Diff: python/pyspark/sql/session.py --- @@ -514,17 +514,21 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr schema = [str(x) for x in data.columns] data = [r.tolist() for r in data.to_records(index=False)] -verify_func = _verify_type if verifySchema else lambda _, t: True if isinstance(schema, StructType): +verify_func = _make_type_verifier(schema) if verifySchema else lambda _: True + def prepare(obj): -verify_func(obj, schema) +verify_func(obj) return obj elif isinstance(schema, DataType): dataType = schema schema = StructType().add("value", schema) +verify_func = _make_type_verifier( +dataType, name="field value") if verifySchema else lambda _: True --- End diff -- I don't think so. It should return `None` I think but I just wanted to avoid other changes for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125383823 --- Diff: python/pyspark/sql/tests.py --- @@ -30,6 +30,19 @@ import functools import time import datetime +import traceback + +if sys.version_info[:2] <= (2, 6): --- End diff -- I think it is a preference. I don't think either is particularly better. Just per documentation. it sounds `version_info` is preferred - https://docs.python.org/2/library/sys.html#sys.version > Do not extract version information out of it, rather, use `version_info` and the functions although we are not "extract"ing anyway ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18469 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Allow case-insensitive type eq...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r125383817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -79,8 +80,12 @@ abstract class DataType extends AbstractDataType { * Check if `this` and `other` are the same data type when ignoring nullability * (`StructField.nullable`, `ArrayType.containsNull`, and `MapType.valueContainsNull`). */ - private[spark] def sameType(other: DataType): Boolean = -DataType.equalsIgnoreNullability(this, other) + private[spark] def sameType(other: DataType, isCaseSensitive: Boolean = true): Boolean = --- End diff -- maybe we should not consider field names in `sameType`, @gatorsmile what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18520 **[Test build #79124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79124/testReport)** for PR 18520 at commit [`0b9f860`](https://github.com/apache/spark/commit/0b9f860cee44bb06feeb291b566243e139cbaf28). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18228 As we already have `substring`, do we still need them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18469 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79115/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18469 **[Test build #79115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79115/testReport)** for PR 18469 at commit [`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18444 I guess you can define source code encoding to the top of the file like: ``` # -*- coding: utf-8 -*- ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79117/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17633 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18174 **[Test build #79123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79123/testReport)** for PR 18174 at commit [`3efc743`](https://github.com/apache/spark/commit/3efc7433802155c957e78d23abf4847cde8e0d07). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17633 **[Test build #79117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79117/testReport)** for PR 17633 at commit [`7965ef3`](https://github.com/apache/spark/commit/7965ef35ce45dbaabb7ba525a1b41625365b9da6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18174 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125382241 --- Diff: python/pyspark/sql/types.py --- @@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType): } -def _verify_type(obj, dataType, nullable=True): +def _make_type_verifier(dataType, nullable=True, name=None): """ Verify the type of obj against dataType, raise a TypeError if they do not match. Also verify the value of obj against datatype, raise a ValueError if it's not within the allowed range, e.g. using 128 as ByteType will overflow. Note that, Python float is not checked, so it will become infinity when cast to Java float if it overflows. ->>> _verify_type(None, StructType([])) ->>> _verify_type("", StringType()) ->>> _verify_type(0, LongType()) ->>> _verify_type(list(range(3)), ArrayType(ShortType())) ->>> _verify_type(set(), ArrayType(StringType())) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(StructType([]))(None) +>>> _make_type_verifier(StringType())("") +>>> _make_type_verifier(LongType())(0) +>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3))) +>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... TypeError:... ->>> _verify_type({}, MapType(StringType(), IntegerType())) ->>> _verify_type((), StructType([])) ->>> _verify_type([], StructType([])) ->>> _verify_type([1], StructType([])) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({}) +>>> _make_type_verifier(StructType([]))(()) +>>> _make_type_verifier(StructType([]))([]) +>>> _make_type_verifier(StructType([]))([1]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... >>> # Check if numeric values are within the allowed range. ->>> _verify_type(12, ByteType()) ->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType())(12) +>>> _make_type_verifier(ByteType())(1234) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type(None, ByteType(), False) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(ByteType(), False)(None) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier( +... ArrayType(ShortType(), False))([1, None]) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... ->>> _verify_type({None: 1}, MapType(StringType(), IntegerType())) +>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 1}) Traceback (most recent call last): ... ValueError:... >>> schema = StructType().add("a", IntegerType()).add("b", StringType(), False) ->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL +>>> _make_type_verifier(schema)((1, None)) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValueError:... """ -if obj is None: -if nullable: -return + +if name is None: +new_msg = lambda msg: msg +new_name = lambda n: "field %s" % n +else: +new_msg = lambda msg: "%s: %s" % (name, msg) +new_name = lambda n: "field %s in %s" % (n, name) + +def verify_nullability(obj): +if obj is None: +if nullable: +return True +else: +raise ValueError(new_msg("This field is not nullable, but got None")) else: -raise ValueError("This field is not nullable, but got None") +return False # StringType can work with any types if isinstance(dataType, StringType): -return +def verify_string(obj): +if verify_nullability(obj): +return None --- End diff -- why a verify method needs to return something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125382032 --- Diff: python/pyspark/sql/session.py --- @@ -514,17 +514,21 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr schema = [str(x) for x in data.columns] data = [r.tolist() for r in data.to_records(index=False)] -verify_func = _verify_type if verifySchema else lambda _, t: True if isinstance(schema, StructType): +verify_func = _make_type_verifier(schema) if verifySchema else lambda _: True + def prepare(obj): -verify_func(obj, schema) +verify_func(obj) return obj elif isinstance(schema, DataType): dataType = schema schema = StructType().add("value", schema) +verify_func = _make_type_verifier( +dataType, name="field value") if verifySchema else lambda _: True --- End diff -- is "field value" useful in the error message? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18522: [MINOR]Closes stream and releases any system reso...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18522#discussion_r125380867 --- Diff: core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala --- @@ -76,7 +76,11 @@ private[spark] class FileAppender(inputStream: InputStream, file: File, bufferSi } } } { -closeFile() +try { + inputStream.close() --- End diff -- The another reason is that this function runs in another thread --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18521#discussion_r125380790 --- Diff: python/pyspark/sql/tests.py --- @@ -30,6 +30,19 @@ import functools import time import datetime +import traceback + +if sys.version_info[:2] <= (2, 6): --- End diff -- this is different from how we check python version in other files, is this better than `sys.version >= '3'`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r125380209 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapCachedBatch.java --- @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.vectorized; + +import java.nio.ByteBuffer; + +import org.apache.spark.memory.MemoryMode; +import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder; +import org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter; +import org.apache.spark.sql.execution.columnar.*; +import org.apache.spark.sql.types.*; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * A column backed by an in memory JVM array. + */ +public final class OnHeapCachedBatch extends ColumnVector implements java.io.Serializable { + + // keep compressed data + private byte[] buffer; + + // whether a row is already extracted or not. If extractTo() is called, set true + // e.g. when isNullAt() and getInt() ara called, extractTo() must be called only once + private boolean[] calledExtractTo; + + // a row where the compressed data is extracted + private transient UnsafeRow unsafeRow; + private transient BufferHolder bufferHolder; + private transient UnsafeRowWriter rowWriter; + private transient MutableUnsafeRow mutableRow; + + // accesssor for a column + private transient ColumnAccessor columnAccessor; + + // an accessor uses only column 0 + private final int ORDINAL = 0; + + protected OnHeapCachedBatch(int capacity, DataType type) { +super(capacity, type, MemoryMode.ON_HEAP_CACHEDBATCH); +reserveInternal(capacity); +reset(); + } + + @Override + public long valuesNativeAddress() { +throw new RuntimeException("Cannot get native address for on heap column"); + } + @Override + public long nullsNativeAddress() { +throw new RuntimeException("Cannot get native address for on heap column"); + } + + @Override + public void close() { + } + + private void initialize() { +if (columnAccessor == null) { + setColumnAccessor(); +} +if (mutableRow == null) { + setRowSetter(); +} + } + + private void setColumnAccessor() { +ByteBuffer byteBuffer = ByteBuffer.wrap(buffer); +columnAccessor = ColumnAccessor$.MODULE$.apply(type, byteBuffer); +calledExtractTo = new boolean[capacity]; + } + + private void setRowSetter() { +unsafeRow = new UnsafeRow(1); +bufferHolder = new BufferHolder(unsafeRow); +rowWriter = new UnsafeRowWriter(bufferHolder, 1); +mutableRow = new MutableUnsafeRow(rowWriter); + } + + // call extractTo() before getting actual data + private void prepareRowAccess(int rowId) { --- End diff -- this looks weird that we put the value to a row and then read that value from the row, can we return that value directly? e.g. `columnAccessor.extractTo` should be able to take a `ColumnVector` as input and set value to it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18522: [MINOR]Closes stream and releases any system reso...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/18522#discussion_r125379907 --- Diff: core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala --- @@ -76,7 +76,11 @@ private[spark] class FileAppender(inputStream: InputStream, file: File, bufferSi } } } { -closeFile() +try { + inputStream.close() --- End diff -- Yes,you are right. But this function is only used in `ExecutorRunner`, also if an exception occurs within this function,this will ensure the inputStream is closed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r125379816 --- Diff: core/src/main/java/org/apache/spark/memory/MemoryMode.java --- @@ -22,5 +22,6 @@ @Private public enum MemoryMode { ON_HEAP, - OFF_HEAP + OFF_HEAP, + ON_HEAP_CACHEDBATCH --- End diff -- hmm, I don't think this can be a new memory mode... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org