[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71824/testReport)** for PR 16579 at commit [`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 I am sorry that I did't grasp the key points of your question. In Hive, if there are data files under the specified path while creating an external table, then Hive will identify the files as table data files. In many spark applications, external table data is generated by other applications under the external table path. So, Hive did nothing with the directory specified in the LOCATION. Thank you for your patience and guidance. @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16642 **[Test build #71829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71829/testReport)** for PR 16642 at commit [`c200b98`](https://github.com/apache/spark/commit/c200b986fed37015a30f99ba2f870dda84cc2ef6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper in Spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16566 **[Test build #71828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71828/testReport)** for PR 16566 at commit [`d36c23a`](https://github.com/apache/spark/commit/d36c23a3736cf985c9692f4a14e00945a2d38732). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16642: [SPARK-19284][SQL]append to partitioned datasourc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16642#discussion_r97262909 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala --- @@ -92,6 +111,16 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext { } } + test("append data to an existed partitioned table without custom partition path") { +withTable("t") { + withSQLConf("spark.sql.sources.commitProtocolClass" -> --- End diff -- nit: SQLConf.FILE_COMMIT_PROTOCOL_CLASS.key -> classOf[OnlyDetectCustomPathFileCommitProtocol].getName --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16642: [SPARK-19284][SQL]append to partitioned datasourc...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16642#discussion_r97262157 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala --- @@ -92,6 +96,47 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext { } } + test("append data an existed partition in a datasource table," + --- End diff -- thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16642: [SPARK-19284][SQL]append to partitioned datasourc...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16642#discussion_r97262179 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala --- @@ -92,6 +96,47 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext { } } + test("append data an existed partition in a datasource table," + +"custom location sent to Task should be None ") { +withTable("t") { + Seq((1, 2)).toDF("a", "b").write.partitionBy("b").saveAsTable("t") + val writer = Seq((3, 2)).toDF("a", "b").write.mode("append").partitionBy("b") + + spark.sessionState.executePlan(writer.createTableCommand(TableIdentifier("t"))) --- End diff -- good idea, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16642 **[Test build #71827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71827/testReport)** for PR 16642 at commit [`aff53dc`](https://github.com/apache/spark/commit/aff53dc40330176987056a827f81a01419ce1e1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16521: [SPARK-19139][core] New auth mechanism for transport lib...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16521 Made one pass. Looks good overall. Just some nits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary returns erro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #71826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71826/testReport)** for PR 1 at commit [`d1a2d6c`](https://github.com/apache/spark/commit/d1a2d6c9a83adc184dcc88ec3fd78b63ede39b89). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16666: [SPARK-19319][SparkR]:SparkR Kmeans summary retur...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r97260863 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -225,10 +225,12 @@ setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula" #' @param object a fitted k-means model. #' @return \code{summary} returns summary information of the fitted model, which is a list. -#' The list includes the model's \code{k} (number of cluster centers), +#' The list includes the model's \code{k} (the configured number of cluster centers), #' \code{coefficients} (model cluster centers), -#' \code{size} (number of data points in each cluster), and \code{cluster} -#' (cluster centers of the transformed data). +#' \code{size} (number of data points in each cluster), \code{cluster} +#' (cluster centers of the transformed data), and \code{clusterSize} +#' (the actual number of cluster centers. When using initMode = "random", --- End diff -- OK. I will add it. For bisecting kmeans, I haven't found a case like this. This case only occurs when initMode is random and this behavior was due to one fix to kmeans implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...
Github user admackin commented on the issue: https://github.com/apache/spark/pull/16652 I've addressed all the problems I think â code style now fixed, MLTestingUtils patched (and verified all MLLib test cases still pass), and added a test case for zero-valued labels --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: spark-19115
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 Please keep updating your PR description. For example, this PR is not relying on `manual tests`. In addition, you also need to summarize what this PR did. List more details to help reviewers understand your changes and impacts. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: spark-19115
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 Let me rephrase it. If the directory specified in the `LOCATION` spec contains the other files, what does Hive behave? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: spark-19115
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 First, please change the PR title to `[SPARK-19115] [SQL] Supporting Create External Table Like Location` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16645 **[Test build #71825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71825/testReport)** for PR 16645 at commit [`c55a1f9`](https://github.com/apache/spark/commit/c55a1f95491b10208ccd2cdf5910e6ec813c3522). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16671 The connectors by some DBMS vendors are using the UNLOAD utility, which performs much better, and build the RDD in the connectors. Normally, JDBC is not a good option for large table fetching and writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 Metrics evaluate the clustering though; the details of the algorithm are irrelevant. This still clusters points in a continuous space so you can measure WSSSE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71821/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 :- ) No perfect solution, but we should use the [metric prefix](https://en.wikipedia.org/wiki/Metric_prefix) when the number is huge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 SQLServer has three ways to show the plan: graphical plans, text plans, and XML plans. Actually, it is pretty advanced. When using the text plans, users can set the output formats: 1. SHOWPLAN_ALL â A reasonably complete set of data showing the estimated execution plan for the query. 2. SHOWPLAN_TEXT â Provides a very limited set of data for use with tools like osql.exe. It, too, only shows the estimated execution plan 3. STATISTICS PROFILE â Similar to SHOWPLAN_ALL except it represents the data for the actual execution plan. I found a 300-pages book `SQL Server Execution Plans`. For details, you can [download and read it](http://download.red-gate.com/ebooks/SQL/eBOOK_SQLServerExecutionPlans_2Ed_G_Fritchey.pdf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16675 @yanboliang Thanks. Seems to have passed tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16659 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71824/testReport)** for PR 16579 at commit [`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16659 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 The only failure is irrelevant to this PR. ``` [info] - set spark.sql.warehouse.dir *** FAILED *** (5 minutes, 0 seconds) [info] Timeout of './bin/spark-submit' '--class' 'org.apache.spark.sql.hive.SetWarehouseLocationTest' '--name' 'SetSparkWarehouseLocationTest' '--master' 'local-cluster[2,1,1024]' '--conf' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97254989 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { +getNumPartitionsRDD(toRDD(x)) --- End diff -- shall we add the `getNumPartitions` to `DataFrame/Dataset` at scala side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 As of MySQL 5.7.3, the EXPLAIN statement is changed so that the effect of the EXTENDED keyword is always enabled. ``` mysql> EXPLAIN EXTENDED -> SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G *** 1. row *** id: 1 select_type: PRIMARY table: t1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 4 filtered: 100.00 Extra: Using index *** 2. row *** id: 2 select_type: SUBQUERY table: t2 type: index possible_keys: a key: a key_len: 5 ref: NULL rows: 3 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.00 sec) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71822/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71822/testReport)** for PR 16579 at commit [`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71818/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16659 **[Test build #71818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71818/testReport)** for PR 16659 at commit [`0753ee6`](https://github.com/apache/spark/commit/0753ee6da4d5698d3a30d89e60ec45aca9e18f35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 PostgreSQL has [a few different options in the EXPLAIN command](https://www.postgresql.org/docs/9.3/static/sql-explain.html): ``` EXPLAIN SELECT * FROM foo WHERE i = 4; QUERY PLAN -- Index Scan using fi on foo (cost=0.00..5.98 rows=1 width=4) Index Cond: (i = 4) (2 rows) ``` The same plan with cost estimates suppressed: ``` EXPLAIN (COSTS FALSE) SELECT * FROM foo WHERE i = 4; QUERY PLAN Index Scan using fi on foo Index Cond: (i = 4) (2 rows) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16552#discussion_r97253775 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1353,6 +1353,15 @@ class HiveDDLSuite sql("INSERT INTO t SELECT 2, 'b'") checkAnswer(spark.table("t"), Row(9, "x") :: Row(2, "b") :: Nil) + Seq(10 -> "y").toDF("i", "j") --- End diff -- please add a new test, to append to a hive table, also test append to a data source table with hive provider and check the error message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 DB2 has a tool to format the contents of the EXPLAIN tables. Below is an example of the output with explanation: ![screenshot 2017-01-22 21 05 45](https://cloud.githubusercontent.com/assets/11567269/22192191/b054c198-e0e6-11e6-8d64-807c5e196e1b.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16344 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16344 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71823/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16344 **[Test build #71823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71823/testReport)** for PR 16344 at commit [`54da2cb`](https://github.com/apache/spark/commit/54da2cbbb53ddde3a91ef6d0d98128d8c7f3deb8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16672: [SPARK-19329][SQL]insert data to a not exist location da...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16672 In hive: 1. read a table with non-existing path, no exception and return 0 rows 2. read a table with non-permission path, throw runtime exception ``` FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if hdfs:/tmp/noownerpermission is encrypted: org.apache.hadoop.security.AccessControlException: Permission denied: user=test, access=READ, inode="/tmp/noownerpermission":hadoop:hadoop:drwxr-x--x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320) ``` 3. write to a non-exist path ,it will create it and insert data to it, everything is ok 4. write to a non-permission path, it will throw an exception 5. alter table set location to a non-permission path, it is ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16659 LGTM pending test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16587 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16579 LGTM pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16669: [SPARK-16101][SQL] Refactoring CSV read path to be consi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16669 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16675 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71820/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16675 **[Test build #71820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71820/testReport)** for PR 16675 at commit [`97b0a1c`](https://github.com/apache/spark/commit/97b0a1c9e5f7bfdae2407d5017418f3dda9a1e71). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16669: [SPARK-16101][SQL] Refactoring CSV read path to b...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16669 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16587 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16594 Let us do some research how the other RDBMSs are doing it? For example, Oracle ``` SQL> explain plan for select * from product; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT - Plan hash value: 3917577207 - | Id | Operation | Name| Rows | Bytes | - | 0 | SELECT STATEMENT | | 15856 | 1254K| | 1 | TABLE ACCESS FULL | PRODUCT | 15856 | 1254K| - ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16594 @rxin Can we add a flag to enable or disable it? Currently there's no other way to see size and row count except debugging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/16671 @HyukjinKwon One assumption behind this design is that the specified column has index in most real scenario, so the table scan cost is not much high. What I observed is that most large table has sharding, so count cost is acceptable, this is the reason why we cost less time in a 5M rows table than in a 1M rows table. If we use the `repartition`, there is a bottleneck when loading data from DB and high cost for `repartition`. Anyway, this solution is expensive indeed and not a good one, maybe the best way is using the Spark connectors provided by the DBMS vendors as @gatorsmile suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16587 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16579 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16594 sorry this explain plan makes no sense -- it is impossible to read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250719 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Oh, I understand. Thanks. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16675 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16675 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16675 **[Test build #71819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71819/testReport)** for PR 16675 at commit [`c2b4132`](https://github.com/apache/spark/commit/c2b41324f8f6e2e1db3bd121b9e29fd9d6a5d98c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250587 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- oh, i meant the final Jenkins test result is failed. nvm, i think it is still useful so we can better infer which test causes the failure if we don't interfere other tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16636 Ideally the table schema must be specified or inferred before saving to metastore, however, for hive serde tables, we have to save it to metastore first, and let the hive metastore to infer the schema. Is it possible we can extract the schema inference logic from hive metastore? so that we can make data source tables and hive serde tables more consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250343 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Maybe, we are confusing on *terms*. - You meant the other test *statements*. - I meant the other test *cases* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250196 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- :) The point is `the other test cases` are still running. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97250113 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- it is failed. isn't it?? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249959 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- The whole Jenkins test does not fail. You can see the test report in the PR description. Here. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71539/testReport/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249854 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- hmm, when it throws an exception, the whole test fails. does it still matter it interferes other tests or not. :-) it is harmless to keep this try block, anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks so much for your detailed review. Your suggestions make lots of sense and I have included all of them in the new commit. Let me know if there is any other change needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16671 FWIW, I am negative of this approach too. It does not look a good solution to require full table scans to resolve skew between partitions. As said, it is not good for a large table. Then, why don't we just repartition if the data is expected to be not quite large if we _should_ resolve the skew? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16344 **[Test build #71823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71823/testReport)** for PR 16344 at commit [`54da2cb`](https://github.com/apache/spark/commit/54da2cbbb53ddde3a91ef6d0d98128d8c7f3deb8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249644 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Yes, but we need to clean up `spark.test` in order not to interrupt the other test cases here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249538 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- but you don't catch anything actually? so if any regression in the future, is it different with a try or not? you still see an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249317 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- However, IMO, it's needed if there occurs some regression for this case in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71822/testReport)** for PR 16579 at commit [`7879201`](https://github.com/apache/spark/commit/7879201961b0f0caa997c9fe6446c0b1b46124f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16675 Looks good, I'll merge if it passes test. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249218 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- Ah, I see what you meant. Actually, previously, `SET -v` raises exceptions, so this case use `try` and `catch`. But, as you mentioned, now it's not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97249076 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) +spark.sessionState.conf.clear() + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) --- End diff -- oh, i meant that you actually don't need a try {} finally {} here. you don't cache anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71821/testReport)** for PR 16579 at commit [`7061cd9`](https://github.com/apache/spark/commit/7061cd9ccd5684301efb2c6c6a8b05af36f65417). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16594 @hvanhovell I've updated the description which shows a simple example. The explained plan will become hard to read when joining many tables and sizeInBytes is computed by the simple way (non-cbo way), i.e. we just multiply all the sizes of these tables, then sizeInBytes becomes a super large value (could be more than a hundred digits). e.g. part of the explained plan of tpcds q31 looks like this (not using cbo): ``` == Optimized Logical Plan == Sort [ca_county#67 ASC NULLS FIRST], true: sizeInBytes=230,651,011,002,878,340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false +- Project [ca_county#67, d_year#38, CheckOverflow((web_sales#769 / web_sales#6), DecimalType(37,20)) AS web_q1_q2_increase#1, CheckOverflow((store_sales#387 / store_sales#5), DecimalType(37,20)) AS store_q1_q2_increase#2, CheckOverflow((web_sales#960 / web_sales#769), DecimalType(37,20)) AS web_q2_q3_increase#3, CheckOverflow((store_sales#578 / store_sales#387), DecimalType(37,20)) AS store_q2_q3_increase#4]: sizeInBytes=230,651,011,002,878,340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false +- Join Inner, ((ca_county#271 = ca_county#1132) && (CASE WHEN (web_sales#769 > 0.00) THEN CheckOverflow((web_sales#960 / web_sales#769), DecimalType(37,20)) ELSE null END > CASE WHEN (store_sales#387 > 0.00) THEN CheckOverflow((store_sales#578 / store_sales#387), DecimalType(37,20)) ELSE null END)): sizeInBytes=288,313,763,753,597,950,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false :- Project [ca_county#67, d_year#38, store_sales#5, store_sales#387, store_sales#578, ca_county#271, web_sales#6, web_sales#769]: sizeInBytes=19,387,614,432,995,145,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false : +- Join Inner, ((ca_county#271 = ca_county#941) && (CASE WHEN (web_sales#6 > 0.00) THEN CheckOverflow((web_sales#769 / web_sales#6), DecimalType(37,20)) ELSE null END > CASE WHEN (store_sales#5 > 0.00) THEN CheckOverflow((store_sales#387 / store_sales#5), DecimalType(37,20)) ELSE null END)): sizeInBytes=23,602,313,222,776,697,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false : :- Join Inner, (ca_county#67 = ca_county#271): sizeInBytes=1,587,133,900,693,866,200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false : : :- Project [ca_county#67, d_year#38, store_sales#5, store_sales#387, store_sales#578]: sizeInBytes=106,726,573,575,883,570,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false : : : +- Join Inner, (ca_county#559 = ca_county#750): sizeInBytes=182,959,840,415,800,400,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false : : : :- Join Inner, (ca_county#67 = ca_county#559): sizeInBytes=3,338,025,720,406,215,000,000,000,000,000,000,000,000,000,000, isBroadcastable=false : : : : :- Aggregate [ca_county#67, d_qoy#42, d_year#38], [ca_county#67, d_year#38, MakeDecimal(sum(UnscaledValue(ss_ext_sales_price#24)),17,2) AS store_sales#5]: sizeInBytes=60,900,882,318,058,550,000,000, isBroadcastable=false : : : : : +- Project [ss_ext_sales_price#24, d_year#38, d_qoy#42, ca_county#67]: sizeInBytes=66,990,970,549,864,410,000,000, isBroadcastable=false : : : : : +- Join Inner, (ss_addr_sk#15 = ca_address_sk#60): sizeInBytes=79,171,147,013,476,130,000,000, isBroadcastable=false : : : : ::- Project [ss_addr_sk#15, ss_ext_sales_price#24, d_year#38, d_qoy#42]: sizeInBytes=3,963,069,503,456,967, isBroadcastable=false : : : : :: +- Join Inner, (ss_sold_date_sk#9 = d_date_sk#32): sizeInBytes=5,095,375,075,873,244, isBroadcastable=false : : : : :: :- Project [ss_sold_date_sk#9, ss_addr_sk#15, ss_ext_sales_price#24]: sizeInBytes=39,847,153,628, isBroadcastable=false : : : : :: : +- Filter (isnotnull(ss_sold_date_sk#9) && isnotnull(ss_addr_sk#15)): sizeInBytes=245,724,114,045, isBroadcastable=false : : : : :: : +-
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16579 Thank you, @viirya . I noticed that `spark.sessionState.conf.clear()` is useless. I removed that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16675 @yanboliang Thanks for the quick response. How about the new commit, where I just change the value from `getFamily` to lower case when necessary, i.e., in the calculation of p-value and dispersion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16675 **[Test build #71820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71820/testReport)** for PR 16675 at commit [`97b0a1c`](https://github.com/apache/spark/commit/97b0a1c9e5f7bfdae2407d5017418f3dda9a1e71). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16579: [SPARK-19218][SQL] Fix SET command to show a resu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16579#discussion_r97248522 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -982,6 +982,33 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { spark.sessionState.conf.clear() } + test("SPARK-19218 SET command should show a result in a sorted order") { +val overrideConfs = sql("SET").collect() +sql(s"SET test.key3=1") +sql(s"SET test.key2=2") +sql(s"SET test.key1=3") +val result = sql("SET").collect() +assert(result === + (overrideConfs ++ Seq( +Row("test.key1", "3"), +Row("test.key2", "2"), +Row("test.key3", "1"))).sortBy(_.getString(0)) +) + } + + test("SPARK-19218 `SET -v` should not fail with null value configuration") { +import SQLConf._ +val confEntry = SQLConfigBuilder("spark.test").doc("doc").stringConf.createWithDefault(null) + +try { + val result = sql("SET -v").collect() + assert(result === result.sortBy(_.getString(0))) + spark.sessionState.conf.clear() +} finally { --- End diff -- nit: try ... finally seems redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16675 @actuaryzhang I think the change is not appropriate, the function ```getFamily``` should return the raw value that users specified, this is the cause that I didn't change them in #16516 . Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71816/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71816/testReport)** for PR 16579 at commit [`387ab59`](https://github.com/apache/spark/commit/387ab590b8af301433e888e2d7731213e4e254a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16675 I would prefer that `getFamily` returns lower case values directly, because using `getFamily.toLowerCase` can get very cumbersome and I use this a lot in another PR #16344. If we want to keep `getFamily` to retrieve the raw value of family, then I can create a private method `getFamilyLowerCase`. Please advise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16675 **[Test build #71819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71819/testReport)** for PR 16675 at commit [`c2b4132`](https://github.com/apache/spark/commit/c2b41324f8f6e2e1db3bd121b9e29fd9d6a5d98c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16636#discussion_r97247351 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1527,6 +1527,21 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("create a data source table without schema") { +import testImplicits._ +withTempPath { tempDir => + withTable("tab1", "tab2") { +(("a", "b") :: Nil).toDF().write.json(tempDir.getCanonicalPath) + +val e = intercept[AnalysisException] { sql("CREATE TABLE tab1 USING json") }.getMessage --- End diff -- we should also test a data source that can infer schema without files(e.g. LibSVM data source has fixed schema). Ideally we should only fail if the given data source can't infer schema without files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/16479 i will just copy the conversion code over for now thx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16675: [SPARK-19155][ML] make getFamily case insensitive
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16675 [SPARK-19155][ML] make getFamily case insensitive ## What changes were proposed in this pull request? This is a supplement to PR #16516 which did not make the value from `getFamily` case insensitive. This affects the calculation of `dispersion` and `pValue` since the value of family is checked there: ` model.getFamily == Binomial.name || model.getFamily == Poisson.name) `. Current tests of poisson/binomial glm with weight fail when specifying 'Poisson' or 'Binomial'. A simple fix is to is to convert the value of `getFamily` to lower case: ``` def getFamily: String = $(family).toLowerCase ``` ## How was this patch tested? Update existing tests for 'Poisson' and 'Binomial'. @yanboliang @felixcheung @imatiach-msft You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark family Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16675.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16675 commit d33e2f135ae62df20337e2752753bcda2756a73d Author: actuaryzhangDate: 2017-01-23T02:59:12Z make getFamily case insensitive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16659 **[Test build #71818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71818/testReport)** for PR 16659 at commit [`0753ee6`](https://github.com/apache/spark/commit/0753ee6da4d5698d3a30d89e60ec45aca9e18f35). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16344 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16344 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71817/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org