[GitHub] spark pull request #16684: [SPARK-16101][HOTFIX] Fix the build with Scala 2....
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16684 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r97490521 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -475,6 +1164,45 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) } + test("to_unix_timestamp with session local timezone") { --- End diff -- I agree that there are so many similar tests, but I have no idea to generalize them. Would you please give me some code snippets? I'll be able to expand them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16138#discussion_r97490239 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1047,6 +1048,64 @@ case class ToDate(child: Expression) extends UnaryExpression with ImplicitCastIn } /** + * Parses a column to a date based on the given format. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input.", + extended = """ +Examples: + > SELECT _FUNC_('2016-12-31', '-MM-dd'); + 2016-12-31 + """) +// scalastyle:on line.size.limit +case class ParseToDate(left: Expression, format: Expression, child: Expression) + extends RuntimeReplaceable { + + def this(left: Expression, format: Expression) = { +this(left, format, Cast(Cast(new UnixTimestamp(left, format), TimestampType), DateType)) + } + + def this(left: Expression) = { +// RuntimeReplaceable forces the signature, the second value +// is ignored completely +this(left, Literal(""), ToDate(left)) + } + + override def flatArguments: Iterator[Any] = Iterator(left, format) + override def sql: String = s"$prettyName(${left.sql}, ${format.sql})" + + override def prettyName: String = "to_date" + override def dataType: DataType = DateType --- End diff -- this is already defined in `RuntimeReplaceable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16687: [SPARK-19343][DStreams] Do once optimistic checkpoint be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71914/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16687: [SPARK-19343][DStreams] Do once optimistic checkpoint be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16687 **[Test build #71914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71914/testReport)** for PR 16687 at commit [`a63306e`](https://github.com/apache/spark/commit/a63306e53c19b0db6574260c9716c6a76cf223e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16687: [SPARK-19343][DStreams] Do once optimistic checkpoint be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16687 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16138#discussion_r97489950 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1047,6 +1048,64 @@ case class ToDate(child: Expression) extends UnaryExpression with ImplicitCastIn } /** + * Parses a column to a date based on the given format. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input.", + extended = """ +Examples: + > SELECT _FUNC_('2016-12-31', '-MM-dd'); + 2016-12-31 + """) +// scalastyle:on line.size.limit +case class ParseToDate(left: Expression, format: Expression, child: Expression) --- End diff -- we don't need to put `child` in constructor, but simply add a `def child` in the class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16552 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16552 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71910/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16552 **[Test build #71910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71910/testReport)** for PR 16552 at commit [`f34ab6d`](https://github.com/apache/spark/commit/f34ab6dab0bb7ce80d362c0c248bc2c735aeb60b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16138 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16138 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71913/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16138 **[Test build #71913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71913/testReport)** for PR 16138 at commit [`8fa4bfb`](https://github.com/apache/spark/commit/8fa4bfbb72c5c1de214b4a35ef3ed4585e33cf3a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16594 **[Test build #71921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71921/testReport)** for PR 16594 at commit [`bd45854`](https://github.com/apache/spark/commit/bd4585442209334e17b50efd2fdc88328ab78c7e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16269 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16269 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71908/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16269 **[Test build #71908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71908/testReport)** for PR 16269 at commit [`4b68c16`](https://github.com/apache/spark/commit/4b68c168b0e16071b91c93fc7f2be8fabda46fbe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Existing metrics (WSSSE,Loglikelihood) are relevant to detail of algorithm. Computation of WSSSE for KMeans/BisectKMeans use the average vectors as the centers, but for KMedoids the medoids, other than averages, should be used. If we use the same logic in KMeans to compute the WSSSE for KMedoids, I think it will be a mistake. And I found that some supervised algorithms support evaluate method in models: LiR,LoR,GLR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16553 Thank you @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16658: [DOCS] Fix typo in docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16658 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16658: [DOCS] Fix typo in docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71915/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16658: [DOCS] Fix typo in docs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16658 **[Test build #71915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71915/testReport)** for PR 16658 at commit [`9e1e32a`](https://github.com/apache/spark/commit/9e1e32ab2821503db5236d3c13c9904ad6a641a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16687#discussion_r97483845 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala --- @@ -146,6 +147,11 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging { while (!hasTimedOut && !haveAllBatchesBeenProcessed) { Thread.sleep(pollTime) } + if (shouldCheckpoint +&& !(lastProcessedBatch - graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) { +ssc.graph.updateCheckpointData(lastProcessedBatch) +checkpointWriter.write(new Checkpoint(ssc, lastProcessedBatch), false) + } --- End diff -- do once more checkpoint before stop --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16689: SPARK-19342 bug fixed in collect method for collecting t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16689 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16687#discussion_r97483687 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala --- @@ -837,6 +839,29 @@ class StreamingContextSuite extends SparkFunSuite with BeforeAndAfter with Timeo assert(latch.await(60, TimeUnit.SECONDS)) } + test("SPARK-19343 Do once optimistic checkpoint before stop") { +val testDirectory = Utils.createTempDir().getAbsolutePath() +val checkpointDirectory = Utils.createTempDir().getAbsolutePath() +ssc = new StreamingContext(conf.clone.set("someKey", "someValue"), batchDuration) +ssc.checkpoint(checkpointDirectory) +val stream = ssc.textFileStream(testDirectory).checkpoint(batchDuration * 11) +stream.foreachRDD { rdd => rdd.count() } +ssc.start() +try { + Thread.sleep(batchDuration.milliseconds * 13) + ssc.stop(true, true) --- End diff -- Sleep for 13 batch duration, so there should only do once checkpoint before pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16689: SPARK-19342 bug fixed in collect method for colle...
GitHub user titicaca opened a pull request: https://github.com/apache/spark/pull/16689 SPARK-19342 bug fixed in collect method for collecting timestamp column ## What changes were proposed in this pull request? Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs: ``` library(SparkR) sparkR.session(master = "local") df <- data.frame(col1 = c(0, 1, 2), col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01"))) sdf1 <- createDataFrame(df) print(dtypes(sdf1)) df1 <- collect(sdf1) print(lapply(df1, class)) sdf2 <- filter(sdf1, "col1 > 0") print(dtypes(sdf2)) df2 <- collect(sdf2) print(lapply(df2, class)) ``` As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column. This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct. Therefore, we need to cast the data type of the vector explicitly. ## How was this patch tested? The patch can be tested manually with the same code above. You can merge this pull request into a Git repository by running: $ git pull https://github.com/titicaca/spark sparkr-dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16689 commit a51c2eb54ca672ad63495d0709bd3ae7b254bd14 Author: titicacaDate: 2017-01-24T06:24:47Z SPARK-19342 bug fixed in collect method for collecting timestamp column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16552 **[Test build #71920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71920/testReport)** for PR 16552 at commit [`7bf5b50`](https://github.com/apache/spark/commit/7bf5b50c5cfba1ecb02b95c2fa9bb1ae7830ca99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16688: [TESTS][SQL] Setup testdata at the beginning for tests t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16688 **[Test build #71919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71919/testReport)** for PR 16688 at commit [`b71120d`](https://github.com/apache/spark/commit/b71120d562b28c94b8a1b0689b3c2fac11d84a37). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16688: [TESTS][SQL] Setup testdata at the beginning for ...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/16688 [TESTS][SQL] Setup testdata at the beginning for tests to run independently ## What changes were proposed in this pull request? In CachedTableSuite, we are not setting up the test data at the beginning. Some tests fail while trying to run individually. When running the entire suite they run fine. Here are some of the tests that fail - - test("SELECT star from cached table") - test("Self-join cached") As part of this simplified a couple of tests by calling a support method to count the number of InMemoryRelations. ## How was this patch tested? Ran the failing tests individually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark cachetablesuite_simple Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16688.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16688 commit b71120d562b28c94b8a1b0689b3c2fac11d84a37 Author: Dilip BiswalDate: 2017-01-24T06:34:11Z Setup testdata at the beginning for tests to run independently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16552 **[Test build #71918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71918/testReport)** for PR 16552 at commit [`7bf5b50`](https://github.com/apache/spark/commit/7bf5b50c5cfba1ecb02b95c2fa9bb1ae7830ca99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16552 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15945 **[Test build #71917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71917/testReport)** for PR 15945 at commit [`bea519f`](https://github.com/apache/spark/commit/bea519f2ba12312ec96884c3545f74b3bc28c4a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16658: [DOCS] Fix typo in docs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16658 **[Test build #71915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71915/testReport)** for PR 16658 at commit [`9e1e32a`](https://github.com/apache/spark/commit/9e1e32ab2821503db5236d3c13c9904ad6a641a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16594#discussion_r97482084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -54,11 +56,32 @@ case class Statistics( /** Readable string representation for the Statistics. */ def simpleString: String = { -Seq(s"sizeInBytes=$sizeInBytes", - if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "", +Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}", + if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "", s"isBroadcastable=$isBroadcastable" ).filter(_.nonEmpty).mkString(", ") } + + /** Print the given number in a readable format. */ + def format(number: BigInt, isSize: Boolean): String = { --- End diff -- I'll try to use that method in combination with current logic, thanks for reminding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r97482071 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -103,6 +153,51 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row("2015", "2015", "2013")) } + test("date format with session local timezone") { +val df = Seq((d, sdf.format(d), ts)).toDF("a", "b", "c") + +// The child of date_format is implicitly casted to TimestampType with session local timezone. +// +// +---+-+-+-+ +// | | df | timestamp | date_format | +// +---+-+-+-+ +// | a |16533|142847640|"2015-04-08 00:00:00"| +// | b |"2015-04-08 13:10:15"|1428523815000|"2015-04-08 13:10:15"| --- End diff -- Do you mean you are wondering why `sdf.format(d)` has the time info `13:10:15` ? If so, `java.sql.Date` DOES have the time info if it was initialized with the constructor `Date(long date)` or even if it was initalized with the constructor `Date(int year, int month, int day)` or with `Date.valueOf(String s)`, it has the time info `00:00:00` of the day in the timezone `TimeZone.getDefault()`. ```scala scala> TimeZone.setDefault(TimeZone.getTimeZone("GMT")) scala> val gmtDate = Date.valueOf("2017-01-24") gmtDate: java.sql.Date = 2017-01-24 scala> val gmtTime = gmtDate.getTime gmtTime: Long = 148521600 scala> TimeZone.setDefault(TimeZone.getTimeZone("PST")) scala> val pstDate = Date.valueOf("2017-01-24") pstDate: java.sql.Date = 2017-01-24 scala> val pstTime = pstDate.getTime pstTime: Long = 148524480 scala> val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss") sdf: java.text.SimpleDateFormat = java.text.SimpleDateFormat@4f76f1a0 scala> sdf.setTimeZone(TimeZone.getTimeZone("GMT")) scala> sdf.format(gmtTime) res12: String = 2017-01-24 00:00:00 scala> sdf.format(pstTime) res13: String = 2017-01-24 08:00:00 scala> val d = new Date(sdf.parse("2015-04-08 13:10:15").getTime) d: java.sql.Date = 2015-04-08 scala> sdf.format(d) res14: String = 2015-04-08 13:10:15 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #71916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71916/testReport)** for PR 15880 at commit [`a11f89b`](https://github.com/apache/spark/commit/a11f89bf5ed13b4061a29daf007a608314465a94). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16683: [SPARK-19268][SS]Disallow adaptive query executio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16683 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16171 cc @yanboliang @sethah @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]insert data to a not exist location da...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16672 It seems to me following hive is safer, any other ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16606 LGTM, pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16683: [SPARK-19268][SS]Disallow adaptive query execution for s...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16683 Thanks. Merging to master and 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16687: [SPARK-19343][DStreams] Do once optimistic checkpoint be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16687 **[Test build #71914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71914/testReport)** for PR 16687 at commit [`a63306e`](https://github.com/apache/spark/commit/a63306e53c19b0db6574260c9716c6a76cf223e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16594#discussion_r97481455 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -54,11 +56,32 @@ case class Statistics( /** Readable string representation for the Statistics. */ def simpleString: String = { -Seq(s"sizeInBytes=$sizeInBytes", - if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "", +Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}", + if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "", s"isBroadcastable=$isBroadcastable" ).filter(_.nonEmpty).mkString(", ") } + + /** Print the given number in a readable format. */ + def format(number: BigInt, isSize: Boolean): String = { --- End diff -- That method can only accepts Long parameter, and estimated stats can still be unreadable even when using TB as unit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16687 [SPARK-19343][DStreams] Do once optimistic checkpoint before stop ## What changes were proposed in this pull request? Streaming job restarts from checkpoint, and it will rebuild several batch until finding latest checkpointed RDD. So we can do once optimistic checkpoint just before stop, so that reducing unnecessary recomputation. ## How was this patch tested? add new unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19343 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16687.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16687 commit a63306e53c19b0db6574260c9716c6a76cf223e0 Author: uncleGenDate: 2017-01-24T06:24:08Z SPARK-19343: Do once optimistic checkpoint before stop --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14038 @liancheng ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16553 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16594 @gatorsmile I just did a quick fix to show how the improved stats look like. If @rxin @hvanhovell accept the change proposed in this pr, I'll update to remove the flag :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16552#discussion_r97481030 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1461,6 +1461,25 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { }) } + test("run sql directly on files - hive") { +withTable("t") { --- End diff -- you don't need to create a table ``` withTempPath { path => spark.range(100).toDF.write.parquet(path.getAbsolutePath) ... sql(s"select id from hive.`${path.getAbsolutePath}`") } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16553 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16552#discussion_r97480933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala --- @@ -65,6 +65,10 @@ case class CreateTempViewUsing( } def run(sparkSession: SparkSession): Seq[Row] = { +if (provider.toLowerCase == DDLUtils.HIVE_PROVIDER) { + throw new AnalysisException("Currently Hive data source can not be created as a view") --- End diff -- `Hive data source can only be used with tables, you cannot use it with CREATE TEMP VIEW USING` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16552#discussion_r97480861 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -112,12 +112,6 @@ case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPl throw new AnalysisException("Saving data into a view is not allowed.") } - if (DDLUtils.isHiveTable(existingTable)) { -throw new AnalysisException(s"Saving data in the Hive serde table $tableName is " + - "not supported yet. Please use the insertInto() API as an alternative.") - } - - // Check if the specified data source match the data source of the existing table. --- End diff -- why remove this line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/16582#discussion_r97478738 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -337,17 +350,20 @@ private[spark] object JettyUtils extends Logging { // The number of selectors always equals to the number of acceptors minThreads += connector.getAcceptors * 2 } - server.setConnectors(connectors.toArray) pool.setMaxThreads(math.max(pool.getMaxThreads, minThreads)) val errorHandler = new ErrorHandler() errorHandler.setShowStacks(true) errorHandler.setServer(server) server.addBean(errorHandler) + + gzipHandlers.foreach(collection.addHandler) server.setHandler(collection) + + server.setConnectors(connectors.toArray) --- End diff -- Why did you move `server.setConnectors(connectors.toArray)` and `gzipHandlers.foreach(collection.addHandler)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/16582#discussion_r97479049 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -274,25 +277,28 @@ private[spark] object JettyUtils extends Logging { conf: SparkConf, serverName: String = ""): ServerInfo = { -val collection = new ContextHandlerCollection addFilters(handlers, conf) val gzipHandlers = handlers.map { h => + h.setVirtualHosts(Array("@" + SPARK_CONNECTOR_NAME)) --- End diff -- Do we need this code here? `setVirtualHosts` should always be called in `addHandler` for each handler right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #71911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71911/testReport)** for PR 16171 at commit [`b6dd52c`](https://github.com/apache/spark/commit/b6dd52cda34051e5e76df55a76ff83d57fb8a51b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71911/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97479326 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") ( @Since("2.0.0") object GaussianMixture extends DefaultParamsReadable[GaussianMixture] { + /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */ + private[clustering] val MAX_NUM_FEATURES = 46000 --- End diff -- We have to unpack the covariance matrix to a full covariance matrix before returning the model. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 I still do not think using an internal configuration is a user friendly way to show the plan costs. Using this way, we do not want users to see it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16594#discussion_r97478978 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -54,11 +56,32 @@ case class Statistics( /** Readable string representation for the Statistics. */ def simpleString: String = { -Seq(s"sizeInBytes=$sizeInBytes", - if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "", +Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}", + if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = false)}" else "", s"isBroadcastable=$isBroadcastable" ).filter(_.nonEmpty).mkString(", ") } + + /** Print the given number in a readable format. */ + def format(number: BigInt, isSize: Boolean): String = { --- End diff -- We are having [`bytesToString` ](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1109-L1132) in Utils.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16594 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71906/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71905/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16594 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16638 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16594 **[Test build #71906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71906/testReport)** for PR 16594 at commit [`0af8d7f`](https://github.com/apache/spark/commit/0af8d7f410b36547727cb2e6445dccf9d12f2cef). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16638 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71904/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16638 **[Test build #71904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71904/testReport)** for PR 16638 at commit [`b80f8e6`](https://github.com/apache/spark/commit/b80f8e66e1cbb7111c090358cabc925c6af233d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71905/testReport)** for PR 16677 at commit [`0a2e96f`](https://github.com/apache/spark/commit/0a2e96fcb42a6fada315fc65a6610314c56ded58). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class FakePartitioning(orgPartition: Partitioning, numPartitions: Int) extends Partitioning ` * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode with CodegenSupport ` * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16138 **[Test build #71913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71913/testReport)** for PR 16138 at commit [`8fa4bfb`](https://github.com/apache/spark/commit/8fa4bfbb72c5c1de214b4a35ef3ed4585e33cf3a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97478414 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") ( @Since("2.0.0") object GaussianMixture extends DefaultParamsReadable[GaussianMixture] { + /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */ + private[clustering] val MAX_NUM_FEATURES = 46000 --- End diff -- In https://github.com/apache/spark/pull/15413, the symmetry of covariance matrix is taken into account and only the upper triangular part is store. So this number seems to be 65535? (`math.sqrt(Int.MaxValue.toDouble * 2)`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71907/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #71907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71907/testReport)** for PR 15880 at commit [`32e4f52`](https://github.com/apache/spark/commit/32e4f52a7673d1d1f573b9d83177c093327d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...
Github user anabranch commented on the issue: https://github.com/apache/spark/pull/16138 @cloud-fan - Reynold referred me to your for this test failure. My two tests are failing because Hive tests *allegedly* cover something like this. ``` SELECT to_date('2001-10-30 10:30:00', '') ``` However, Hive doesn't support passing in multiple parameters to `to_date` as specified in the [language manual](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions). The only instance I see for `to_date` with multiple parameters is [when it talks to ORACLE DB as the metastore](https://github.com/apache/hive/blob/2d813f4d4a0bb42345d153c362f7416f05ab2749/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1122) although I don't know the code base (I grepped for `to_date` and tried to find instances of this occurring). It seems like this test case should not be running in the first place. Can you advise on any suggestions you might have? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16606 **[Test build #71912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71912/testReport)** for PR 16606 at commit [`72164eb`](https://github.com/apache/spark/commit/72164eb02c1b7acd836a5038fddb8bcd8225a1c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71909/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #71909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71909/testReport)** for PR 16171 at commit [`863c9f4`](https://github.com/apache/spark/commit/863c9f45b0ccf066e34d7539ca1f29baf0b49e85). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GBTClassificationModel(TreeEnsembleModel, JavaProbabilisticClassificationModel,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #71911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71911/testReport)** for PR 16171 at commit [`b6dd52c`](https://github.com/apache/spark/commit/b6dd52cda34051e5e76df55a76ff83d57fb8a51b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15880 LGTM The PR title is not right. BTW, we might need a release note for this PR. This will change the behaviors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @MLnick @jkbradley Could you mind making a final pass? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...
Github user anabranch commented on the issue: https://github.com/apache/spark/pull/16138 @felixcheung Thank you for your feedback! Small request, can you tell me if my R test case is sufficient for this? It doesn't seem like there is extensive R testing right now for virtually any function. Obviously tests will pass soon, running into strange edge cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16606 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71902/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16606 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16606 **[Test build #71902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71902/testReport)** for PR 16606 at commit [`04d3940`](https://github.com/apache/spark/commit/04d39406cc5ce43e51adc931b5dc012d6e5fefa9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16552 **[Test build #71910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71910/testReport)** for PR 16552 at commit [`f34ab6d`](https://github.com/apache/spark/commit/f34ab6dab0bb7ce80d362c0c248bc2c735aeb60b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...
Github user smurching commented on the issue: https://github.com/apache/spark/pull/14872 No worries, apologies for being busy on my end -- I'll leave the branch up & try to contribute in other ways when I have the time! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16171 **[Test build #71909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71909/testReport)** for PR 16171 at commit [`863c9f4`](https://github.com/apache/spark/commit/863c9f45b0ccf066e34d7539ca1f29baf0b49e85). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97475352 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { +getNumPartitionsRDD(toRDD(x)) --- End diff -- you said this filled a hole for Spark 2.1, what's this hole? is this Spark R only? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97475188 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { +getNumPartitionsRDD(toRDD(x)) --- End diff -- ah, that we could do easily. is that something ok for Spark 2.1.1? If yes, I could go ahead with changes here for Scala, Python and R. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97474262 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { +getNumPartitionsRDD(toRDD(x)) --- End diff -- isn't just calling `rdd.numPartitions`? we need to materialize the RDD inside DataFrame anyway, but it's cheap at scala side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97473647 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { +getNumPartitionsRDD(toRDD(x)) --- End diff -- Give this is a bit of a hole I think it would be worthwhile to think if there is a reasonable workaround for 2.1.1 release (say JVM wrapper for `.rdd.getNumPartitions`), @shivaram would you agree? As for the new Scala API, since it has broader implications it might be something to target the 2.2 release? If so that would be better served in a different PR. I don't mind taking a shot at that - I'm not super familiar with that and from a quick scan it seems to be non-trivial (to handle different RDD subtypes and so on), so a few pointers would be appreciated, @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16679: [SPARK-19272][SQL] Remove the param `viewOriginal...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16679 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16679: [SPARK-19272][SQL] Remove the param `viewOriginalText` f...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16679 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16679: [SPARK-19272][SQL] Remove the param `viewOriginalText` f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16679 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71900/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16679: [SPARK-19272][SQL] Remove the param `viewOriginalText` f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16679 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16269 **[Test build #71908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71908/testReport)** for PR 16269 at commit [`4b68c16`](https://github.com/apache/spark/commit/4b68c168b0e16071b91c93fc7f2be8fabda46fbe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16679: [SPARK-19272][SQL] Remove the param `viewOriginalText` f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16679 **[Test build #71900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71900/testReport)** for PR 16679 at commit [`b5a48da`](https://github.com/apache/spark/commit/b5a48daab41f8462843a062475413400482d1213). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16638: [SPARK-19115] [SQL] Supporting Create External Ta...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16638#discussion_r97471913 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -81,8 +81,8 @@ statement rowFormat? createFileFormat? locationSpec? (TBLPROPERTIES tablePropertyList)? (AS? query)? #createHiveTable -| CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier -LIKE source=tableIdentifier #createTableLike +| CREATE EXTERNAL? TABLE (IF NOT EXISTS)? target=tableIdentifier --- End diff -- ok then let's simplify the logic: if `location` is specified, we create an external table internally. Else, create managed table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #71907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71907/testReport)** for PR 15880 at commit [`32e4f52`](https://github.com/apache/spark/commit/32e4f52a7673d1d1f573b9d83177c093327d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org