[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79979332 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- @viirya @cloud-fan Actually i am not sure, if the above comment is in sync with the code. When we had this comment, we used to have CreateTableAsSelectLogicalPlan to represent the CTAS case and we used to check for serde's presence to determine whether or not to convert it to a data source table like following. ``` SQL if (sessionState.convertCTAS && table.storage.serde.isEmpty) { // Do the conversion when spark.sql.hive.convertCTAS is true and the query // does not specify any storage format (file format and storage handler). if (table.identifier.database.isDefined) { throw new AnalysisException( "Cannot specify database name in a CTAS statement " + "when spark.sql.hive.convertCTAS is set to true.") } val mode = if (allowExisting) SaveMode.Ignore else SaveMode.ErrorIfExists CreateTableUsingAsSelect( TableIdentifier(desc.identifier.table), conf.defaultDataSourceName, temporary = false, Array.empty[String], bucketSpec = None, mode, options = Map.empty[String, String], child ) } else { val desc = if (table.storage.serde.isEmpty) { // add default serde table.withNewStorage( serde = Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) } else { table } ``` I think this code has changed and moved to SparkSqlParser ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] Use the storage format specified by h...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15190 PR title should be ```Determine Serde by hive.default.fileformat when Creating Hive Serde Tables``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79978495 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- The current checking conditions are based on [ctx.createFileFormat and ctx.rowFormat](https://github.com/dilipbiswal/spark/blob/f2b93de629f378ca99f8d3086ade8dc05b41a912/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L1051-L1052). Thus, I think this PR looks ok. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > I'd want to see some test cases though that show why the current implementation is wrong from an end-user perspective if it needs to block merging initial kafka support. PR with failing test indicating at least one reason why it's wrong from an end-user perspective: https://github.com/zsxwing/spark/pull/4 > I do not think it is reasonable to suggest we block merging this patch on an overhaul of the DataSource API configuration system. Here's what I actually said: 'if you know your plan down the line is to use json for structured configuration, you should use it now, and provide more convenient ways to construct json later, not use "convenient" non-json hacks now.' No hyperbole about blocking on a complete overhaul, nothing that isn't backwards compatible. I'm just saying that, if the design document already recognizes that json is necessary to work around the string -> string interface... start using structured json strings now, and make it more convenient later. Or do you actually think that stuff like option("assign", "topicA:1:1,topicA:2:2,topicB:3:3") makes it clear what the arguments are? > I think @koeninger made a good suggestion to block accepting certain kafka configurations. In case it wasn't clear, I was not suggesting that preventing users from doing things they could otherwise do with Kafka is actually a good idea. I think it's a bad idea, but if you're going to run with it, you might as well be consistent about it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79978157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- The comment is not valid now. This was removed by the PR: https://github.com/apache/spark/pull/13386 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79977535 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- cc @yhuai to confirm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data So...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15046 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15046 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15174: [SPARK-17502] [SQL] [Backport] [2.0] Fix Multiple Bugs i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15174 Sure, will do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79976580 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- I think this is kept as unspecified because it is intended to write the table with Hive write path. If we specify serde here, it will be converted to datasource table. Is it ok? cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14988 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14988 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65755/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14537 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #65755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65755/consoleFull)** for PR 14537 at commit [`fa71370`](https://github.com/apache/spark/commit/fa713700f853e78053ac0be5db49250951aaa715). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15174: [SPARK-17502] [SQL] [Backport] [2.0] Fix Multiple Bugs i...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15174 LGTM, if you have time, can you also include https://github.com/apache/spark/pull/15160? they are kind of related. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15160: [SPARK-17609][SQL] SessionCatalog.tableExists sho...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15160 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15160: [SPARK-17609][SQL] SessionCatalog.tableExists should not...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15160 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14537 LGTM, pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15090 The test suite `StatisticsColumnSuite` misses the negative cases. For example, so far, we do not allow users to analyze the temporary tables. Ideally, all the exceptions the code could issue need a test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15194: New feature for structured streaming: add http stream si...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15194 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15182 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15194: New feature for structured streaming: add http st...
GitHub user zhangxinyu1 opened a pull request: https://github.com/apache/spark/pull/15194 New feature for structured streaming: add http stream sink ## What changes were proposed in this pull request? Add http stream sink for structured streaming. Streaming query results can be sinked to http server through http post request. ## How was this patch tested? Use [quick-example](http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example) and configure DataStreamWriter with .format("http").option("url", httpUrl) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhangxinyu1/spark feature-for-structed-streaming-add-http-sink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15194.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15194 commit 87c48c7ed284b95a27e5a6c7f59ee836a95bb588 Author: zhangxinyu1 <342689...@qq.com> Date: 2016-09-21T07:00:39Z add feature: streaming query results can be output to http server commit 489f629783768bef1024de55367c67c26c7192d0 Author: zhangxinyu1 <342689...@qq.com> Date: 2016-09-22T04:09:56Z new feature for structed streaming: http sink commit f6eca02c4a44a65e012bec8c294b861de9c19560 Author: zhangxinyu1 <342689...@qq.com> Date: 2016-09-22T04:15:35Z new feature for structed streaming: http sink commit 96f17b1397d5858a4ce709691b632852b02682e2 Author: zhangxinyu1 <342689...@qq.com> Date: 2016-09-22T04:25:03Z new feature for structed streaming: add http sink --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79975372 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import scala.collection.mutable + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier} +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStats, LogicalPlan, Statistics} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types._ + + +/** + * Analyzes the given columns of the given table in the current database to generate statistics, + * which will be used in query optimizations. + */ +case class AnalyzeColumnCommand( +tableIdent: TableIdentifier, +columnNames: Seq[String]) extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val sessionState = sparkSession.sessionState +val db = tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase) +val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db)) +val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB)) + +relation match { + case catalogRel: CatalogRelation => +updateStats(catalogRel.catalogTable, + AnalyzeTableCommand.calculateTotalSize(sessionState, catalogRel.catalogTable)) + + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateStats(logicalRel.catalogTable.get, logicalRel.relation.sizeInBytes) + + case otherRelation => +throw new AnalysisException("ANALYZE TABLE is not supported for " + + s"${otherRelation.nodeName}.") +} + +def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit = { + val (rowCount, columnStats) = computeColStats(sparkSession, relation) + val statistics = Statistics( +sizeInBytes = newTotalSize, +rowCount = Some(rowCount), +colStats = columnStats ++ catalogTable.stats.map(_.colStats).getOrElse(Map())) + sessionState.catalog.alterTable(catalogTable.copy(stats = Some(statistics))) + // Refresh the cached data source table in the catalog. + sessionState.catalog.refreshTable(tableIdentWithDB) +} + +Seq.empty[Row] + } + + def computeColStats( + sparkSession: SparkSession, + relation: LogicalPlan): (Long, Map[String, ColumnStats]) = { + +// check correctness of column names +val attributesToAnalyze = mutable.MutableList[Attribute]() +val caseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +columnNames.foreach { col => + val exprOption = relation.output.find { attr => +if (caseSensitive) attr.name == col else attr.name.equalsIgnoreCase(col) + } + val expr = exprOption.getOrElse(throw new AnalysisException(s"Invalid column name: $col.")) + // do deduplication + if (!attributesToAnalyze.contains(expr)) { --- End diff -- Deduplication lacks case sensitivity handling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14035 **[Test build #65758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65758/consoleFull)** for PR 14035 at commit [`13b1a67`](https://github.com/apache/spark/commit/13b1a6751902493e458af162b222aebf879d41da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79975005 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsTest.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.plans.logical.{ColumnStats, Statistics} +import org.apache.spark.sql.execution.command.AnalyzeColumnCommand +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + +trait StatisticsTest extends QueryTest with SharedSQLContext { + + def checkColStats( + df: DataFrame, + expectedColStatsSeq: Seq[(String, ColumnStats)]): Unit = { +val table = "tbl" +withTable(table) { + df.write.format("json").saveAsTable(table) + val columns = expectedColStatsSeq.map(_._1) + val tableIdent = TableIdentifier(table, Some("default")) + val relation = spark.sessionState.catalog.lookupRelation(tableIdent) + val columnStats = +AnalyzeColumnCommand(tableIdent, columns).computeColStats(spark, relation)._2 + expectedColStatsSeq.foreach { expected => +assert(columnStats.contains(expected._1)) +checkColStats(colStats = columnStats(expected._1), expectedColStats = expected._2) + } +} + } + + def checkColStats(colStats: ColumnStats, expectedColStats: ColumnStats): Unit = { +assert(colStats.dataType == expectedColStats.dataType) +assert(colStats.numNulls == expectedColStats.numNulls) +colStats.dataType match { + case _: IntegralType | DateType | TimestampType => +assert(colStats.max.map(_.toString.toLong) == expectedColStats.max.map(_.toString.toLong)) +assert(colStats.min.map(_.toString.toLong) == expectedColStats.min.map(_.toString.toLong)) + case _: FractionalType => +assert(colStats.max.map(_.toString.toDouble) == expectedColStats + .max.map(_.toString.toDouble)) +assert(colStats.min.map(_.toString.toDouble) == expectedColStats + .min.map(_.toString.toDouble)) + case _ => +// other types don't have max and min stats +assert(colStats.max.isEmpty) +assert(colStats.min.isEmpty) +} +colStats.dataType match { + case BinaryType | BooleanType => assert(colStats.ndv.isEmpty) + case _ => +// ndv is an approximate value, so we make sure we have the value, and it should be +// within 3*SD's of the given rsd. +assert(colStats.ndv.get >= 0) +if (expectedColStats.ndv.get == 0) { + assert(colStats.ndv.get == 0) +} else if (expectedColStats.ndv.get > 0) { + val rsd = spark.sessionState.conf.ndvMaxError + val error = math.abs((colStats.ndv.get / expectedColStats.ndv.get.toDouble) - 1.0d) + assert(error <= rsd * 3.0d, "Error should be within 3 std. errors.") +} +} +assert(colStats.avgColLen == expectedColStats.avgColLen) +assert(colStats.maxColLen == expectedColStats.maxColLen) +assert(colStats.numTrues == expectedColStats.numTrues) +assert(colStats.numFalses == expectedColStats.numFalses) + } + + def checkTableStats(tableName: String, expectedRowCount: Option[Int]): Option[Statistics] = { +val df = sql(s"SELECT * FROM $tableName") --- End diff -- ```Scala val df = spark.table(tableName) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14035 **[Test build #65757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65757/consoleFull)** for PR 14035 at commit [`2cbcabd`](https://github.com/apache/spark/commit/2cbcabdcef32280316db1ede1a22934dacf3cf35). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79974658 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -473,15 +476,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } } // construct Spark's statistics from information in Hive metastore -if (catalogTable.properties.contains(STATISTICS_TOTAL_SIZE)) { - val totalSize = BigInt(catalogTable.properties.get(STATISTICS_TOTAL_SIZE).get) - // TODO: we will compute "estimatedSize" when we have column stats: - // average size of row * number of rows +if (catalogTable.properties.filterKeys(_.startsWith(STATISTICS_PREFIX)).nonEmpty) { + val colStatsProps = catalogTable.properties +.filterKeys(_.startsWith(STATISTICS_BASIC_COL_STATS_PREFIX)) +.map { case (k, v) => (k.replace(STATISTICS_BASIC_COL_STATS_PREFIX, ""), v)} --- End diff -- Add a space between `)` and `}` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79974623 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import scala.collection.mutable + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier} +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStats, LogicalPlan, Statistics} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.types._ + + +/** + * Analyzes the given columns of the given table in the current database to generate statistics, + * which will be used in query optimizations. + */ +case class AnalyzeColumnCommand( +tableIdent: TableIdentifier, +columnNames: Seq[String]) extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val sessionState = sparkSession.sessionState +val db = tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase) +val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db)) +val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB)) + +relation match { + case catalogRel: CatalogRelation => +updateStats(catalogRel.catalogTable, + AnalyzeTableCommand.calculateTotalSize(sessionState, catalogRel.catalogTable)) + + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateStats(logicalRel.catalogTable.get, logicalRel.relation.sizeInBytes) + + case otherRelation => +throw new AnalysisException("ANALYZE TABLE is not supported for " + + s"${otherRelation.nodeName}.") +} + +def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit = { + val (rowCount, columnStats) = computeColStats(sparkSession, relation) + val statistics = Statistics( +sizeInBytes = newTotalSize, +rowCount = Some(rowCount), +colStats = columnStats ++ catalogTable.stats.map(_.colStats).getOrElse(Map())) + sessionState.catalog.alterTable(catalogTable.copy(stats = Some(statistics))) + // Refresh the cached data source table in the catalog. + sessionState.catalog.refreshTable(tableIdentWithDB) +} + +Seq.empty[Row] + } + + def computeColStats( + sparkSession: SparkSession, + relation: LogicalPlan): (Long, Map[String, ColumnStats]) = { + +// check correctness of column names +val attributesToAnalyze = mutable.MutableList[Attribute]() +val caseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +columnNames.foreach { col => + val exprOption = relation.output.find { attr => +if (caseSensitive) attr.name == col else attr.name.equalsIgnoreCase(col) + } + val expr = exprOption.getOrElse(throw new AnalysisException(s"Invalid column name: $col.")) + // do deduplication + if (!attributesToAnalyze.contains(expr)) { +attributesToAnalyze += expr + } +} + +// Collect statistics per column. +// The first element in the result will be the overall row count, the following elements +// will be structs containing all column stats. +// The layout of each struct follows the layout of the ColumnStats. +val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError +val expressions =
[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15182 LGTM pending Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13512 @cloud-fan i thought about this a little more, and my suggested changes to the Aggregator api does not allow one to use a different encoder when applying a typed operation on Dataset. so i do not think it is dangerous as such. it does enable usage within the untyped grouping, which is where type conversions are already customary anyhow. its not more dangerous than say using a udaf in a DataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15182 **[Test build #65756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65756/consoleFull)** for PR 15182 at commit [`e2c3b9d`](https://github.com/apache/spark/commit/e2c3b9df0431885efbc9575beb7735590a77cf2f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15154: [SPARK-17494] [SQL] changePrecision() on compact ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15154 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15182 @cloud-fan Ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15188: [SPARK-17627] Mark Streaming Providers Experiment...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15188 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15154: [SPARK-17494] [SQL] changePrecision() on compact decimal...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15154 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15188: [SPARK-17627] Mark Streaming Providers Experimental
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15188 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14537 **[Test build #65755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65755/consoleFull)** for PR 14537 at commit [`fa71370`](https://github.com/apache/spark/commit/fa713700f853e78053ac0be5db49250951aaa715). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15090 **[Test build #65754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65754/consoleFull)** for PR 15090 at commit [`5f6b581`](https://github.com/apache/spark/commit/5f6b5817d59c1b6bb48563357f625521e7c56236). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15005 **[Test build #3286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3286/consoleFull)** for PR 15005 at commit [`53a09cd`](https://github.com/apache/spark/commit/53a09cd5783d55048b2cf7579cf53ccc76bdf3d7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Use metastore schema instead o...
Github user rajeshbalamohan commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r79972251 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -237,21 +237,27 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log new Path(metastoreRelation.catalogTable.storage.locationUri.get), partitionSpec) -val inferredSchema = if (fileType.equals("parquet")) { - val inferredSchema = -defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) - inferredSchema.map { inferred => -ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, inferred) - }.getOrElse(metastoreSchema) -} else { - defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()).get +val schema = fileType match { + case "parquet" => +val inferredSchema = + defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()) + +// For Parquet, get correct schema by merging Metastore schema data types --- End diff -- Sure. Will change to return metastoreSchema for parq as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15005 **[Test build #3286 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3286/consoleFull)** for PR 15005 at commit [`53a09cd`](https://github.com/apache/spark/commit/53a09cd5783d55048b2cf7579cf53ccc76bdf3d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15191: [SPARK-17628][Streaming][Examples] change name "Streamin...
Github user keypointt commented on the issue: https://github.com/apache/spark/pull/15191 oh I see...sorry...I'll close this one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15191: [SPARK-17628][Streaming][Examples] change name "S...
Github user keypointt closed the pull request at: https://github.com/apache/spark/pull/15191 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65752/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15190 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15193 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15190 **[Test build #65752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65752/consoleFull)** for PR 15190 at commit [`f2b93de`](https://github.com/apache/spark/commit/f2b93de629f378ca99f8d3086ade8dc05b41a912). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15139: [SPARK-17315][Follow-up][SparkR][ML] Fix print of...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15139 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/15193 cc @ooq @sameeragarwal @davies is it right and necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15192 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65751/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15139: [SPARK-17315][Follow-up][SparkR][ML] Fix print of Kolmog...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15139 I will merge this into master. If anyone has more comments, I can address them at follow-up work. Thanks for your review. @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15192 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/15193 [SQL]RowBasedKeyValueBatch reuse valueRow too ## What changes were proposed in this pull request? reuse the cached valueRow in RowBasedKeyValueBatch ## How was this patch tested? existing ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark reuse-value Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15193.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15193 commit 0f60e107904fa4d0e92185bd9fae214ee70a1a11 Author: Kent YaoDate: 2016-09-22T02:59:23Z reuse valueRow too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15191: [SPARK-17628][Streaming][Examples] change name "Streamin...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15191 "Foobars" is a common name in Java / Scala for "static methods related to Foobar objects". I think the current name is fine. It's not really an API anyway, just a component of an example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15192 **[Test build #65751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65751/consoleFull)** for PR 15192 at commit [`9eb40db`](https://github.com/apache/spark/commit/9eb40dbcdb0894e699a38e6dc4f44dc97408f63c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...
Github user watermen commented on a diff in the pull request: https://github.com/apache/spark/pull/14988#discussion_r79970830 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -164,4 +164,19 @@ case class HiveTableScanExec( } override def output: Seq[Attribute] = attributes + + override def sameResult(plan: SparkPlan): Boolean = plan match { +case other: HiveTableScanExec => + val thisPredicates = partitionPruningPred.map(cleanExpression) + val otherPredicates = other.partitionPruningPred.map(cleanExpression) + + val result = relation.sameResult(other.relation) && +output.length == other.output.length && + output.zip(other.output) +.forall(p => p._1.name == p._2.name && p._1.dataType == p._2.dataType) && --- End diff -- @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15131 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/12601 @srowen Ping. I don't think there is anything on my plate. This should be mergeable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15131 I will merge this into master. If anyone has more comments, I can address them at follow up work. Thanks for your review. @felixcheung @HyukjinKwon @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65749/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15090 **[Test build #65749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65749/consoleFull)** for PR 15090 at commit [`ec02b2a`](https://github.com/apache/spark/commit/ec02b2a8b7bfb9c10d4d47e2678a44ec0f2f8af8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65746/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14851 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65753/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14851 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14851 **[Test build #65753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65753/consoleFull)** for PR 14851 at commit [`378079d`](https://github.com/apache/spark/commit/378079d4778b4902b3d6956c504e22555aa2884c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15090 **[Test build #65746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65746/consoleFull)** for PR 15090 at commit [`ec02b2a`](https://github.com/apache/spark/commit/ec02b2a8b7bfb9c10d4d47e2678a44ec0f2f8af8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65747/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14124 **[Test build #65747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)** for PR 14124 at commit [`0bc06c6`](https://github.com/apache/spark/commit/0bc06c6e3e931a5f317e043aa5eeea97083b9860). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79968621 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsColumnSuite.scala --- @@ -0,0 +1,352 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.catalyst.parser.ParseException +import org.apache.spark.sql.catalyst.plans.logical.ColumnStats +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.command.AnalyzeColumnCommand +import org.apache.spark.sql.types._ + +class StatisticsColumnSuite extends StatisticsTest { + import testImplicits._ + + test("parse analyze column commands") { +def assertAnalyzeColumnCommand(analyzeCommand: String, c: Class[_]) { + val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand) + val operators = parsed.collect { +case a: AnalyzeColumnCommand => a +case o => o + } + assert(operators.size == 1) + if (operators.head.getClass != c) { +fail( + s"""$analyzeCommand expected command: $c, but got ${operators.head} + |parsed command: + |$parsed + """.stripMargin) + } +} + +val table = "table" +assertAnalyzeColumnCommand( + s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value", + classOf[AnalyzeColumnCommand]) + +intercept[ParseException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS") +} + } + + test("check correctness of columns") { +val table = "tbl" +val colName1 = "abc" +val colName2 = "x.yz" +val quotedColName2 = s"`$colName2`" +withTable(table) { + sql(s"CREATE TABLE $table ($colName1 int, $quotedColName2 string) USING PARQUET") + + val invalidColError = intercept[AnalysisException] { +sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key") + } + assert(invalidColError.message == "Invalid column name: key.") + + withSQLConf("spark.sql.caseSensitive" -> "true") { +val invalidErr = intercept[AnalysisException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS ${colName1.toUpperCase}") +} +assert(invalidErr.message == s"Invalid column name: ${colName1.toUpperCase}.") + } + + withSQLConf("spark.sql.caseSensitive" -> "false") { +val columnsToAnalyze = Seq(colName2.toUpperCase, colName1, colName2) +val columnStats = spark.sessionState.computeColumnStats(table, columnsToAnalyze) --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15185: [SPARK-17618] Fix invalid comparisons between UnsafeRow ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65750/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15185: [SPARK-17618] Fix invalid comparisons between UnsafeRow ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15185 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15185: [SPARK-17618] Fix invalid comparisons between UnsafeRow ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15185 **[Test build #65750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65750/consoleFull)** for PR 15185 at commit [`1319e82`](https://github.com/apache/spark/commit/1319e8281ab3ec14a5ba11fca0261d19b7890ad3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14912 ping @cloud-fan @hvanhovell Can you review this if you have time? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14780 @hvanhovell ok. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15046 This is a new issue of Spark 2.1, after we physically store the inferred schema in the metastore. BTW, I also ran the test cases in Spark 2.0. It works well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14851 **[Test build #65753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65753/consoleFull)** for PR 14851 at commit [`378079d`](https://github.com/apache/spark/commit/378079d4778b4902b3d6956c504e22555aa2884c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65744/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15190 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15190 **[Test build #65744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65744/consoleFull)** for PR 15190 at commit [`f60e760`](https://github.com/apache/spark/commit/f60e760989ff732aa50d4bea3794e1261bc1a0cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15071: [SPARK-17517][SQL]Improve generated Code for BroadcastHa...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/15071 @hvanhovell I think unfixed length fields may lead to memory overlapping when ```BuildLeft```, since we are reusing the ```BufferHolder``` to avoid writing the stream side repeatedly. In this case, the holder can not ```grow``` properly to avoid the left side overlap the right side. When ```BuildRight```, there is no such a problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15146: [SPARK-17590][SQL] Analyze CTE definitions at once and a...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15146 @hvanhovell @cloud-fan Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15190 Please update the PR description. This is not for `orc` only --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79965497 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsTest.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.catalyst.plans.logical.{ColumnStats, Statistics} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + +trait StatisticsTest extends QueryTest with SharedSQLContext { + + def checkColStats( + df: DataFrame, + expectedColStatsSeq: Seq[(String, ColumnStats)]): Unit = { +val table = "tbl" +withTable(table) { + df.write.format("json").saveAsTable(table) + val columns = expectedColStatsSeq.map(_._1) + val columnStats = spark.sessionState.computeColumnStats(table, columns) --- End diff -- Change this too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79965425 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsColumnSuite.scala --- @@ -0,0 +1,352 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.catalyst.parser.ParseException +import org.apache.spark.sql.catalyst.plans.logical.ColumnStats +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.command.AnalyzeColumnCommand +import org.apache.spark.sql.types._ + +class StatisticsColumnSuite extends StatisticsTest { + import testImplicits._ + + test("parse analyze column commands") { +def assertAnalyzeColumnCommand(analyzeCommand: String, c: Class[_]) { + val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand) + val operators = parsed.collect { +case a: AnalyzeColumnCommand => a +case o => o + } + assert(operators.size == 1) + if (operators.head.getClass != c) { +fail( + s"""$analyzeCommand expected command: $c, but got ${operators.head} + |parsed command: + |$parsed + """.stripMargin) + } +} + +val table = "table" +assertAnalyzeColumnCommand( + s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value", + classOf[AnalyzeColumnCommand]) + +intercept[ParseException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS") +} + } + + test("check correctness of columns") { +val table = "tbl" +val colName1 = "abc" +val colName2 = "x.yz" +val quotedColName2 = s"`$colName2`" +withTable(table) { + sql(s"CREATE TABLE $table ($colName1 int, $quotedColName2 string) USING PARQUET") + + val invalidColError = intercept[AnalysisException] { +sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key") + } + assert(invalidColError.message == "Invalid column name: key.") + + withSQLConf("spark.sql.caseSensitive" -> "true") { +val invalidErr = intercept[AnalysisException] { + sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS ${colName1.toUpperCase}") +} +assert(invalidErr.message == s"Invalid column name: ${colName1.toUpperCase}.") + } + + withSQLConf("spark.sql.caseSensitive" -> "false") { +val columnsToAnalyze = Seq(colName2.toUpperCase, colName1, colName2) +val columnStats = spark.sessionState.computeColumnStats(table, columnsToAnalyze) --- End diff -- Here, you can just replace it by ```Scala val tableIdent = TableIdentifier(table, Option("default")) val relation = spark.sessionState.catalog.lookupRelation(tableIdent) val columnStats = AnalyzeColumnCommand(tableIdent, columnsToAnalyze).computeColStats(spark, relation)._2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79965370 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -186,13 +187,27 @@ private[sql] class SessionState(sparkSession: SparkSession) { } /** - * Analyzes the given table in the current database to generate statistics, which will be + * Analyzes the given table in the current database to generate table-level statistics, which + * will be used in query optimizations. + */ + def analyzeTable(tableIdent: TableIdentifier, noscan: Boolean = true): Unit = { +AnalyzeTableCommand(tableIdent, noscan).run(sparkSession) + } + + /** + * Analyzes the given columns in the table to generate column-level statistics, which will be * used in query optimizations. - * - * Right now, it only supports catalog tables and it only updates the size of a catalog table - * in the external catalog. */ - def analyze(tableName: String, noscan: Boolean = true): Unit = { -AnalyzeTableCommand(tableName, noscan).run(sparkSession) + def analyzeTableColumns(tableIdent: TableIdentifier, columnNames: Seq[String]): Unit = { +AnalyzeColumnCommand(tableIdent, columnNames).run(sparkSession) + } + + // This api is used for testing. + def computeColumnStats(tableName: String, columnNames: Seq[String]): Map[String, ColumnStats] = { --- End diff -- Avoid adding any testing-only API, if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15152: [SPARK-17365][Core] Remove/Kill multiple executors toget...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15152 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15152: [SPARK-17365][Core] Remove/Kill multiple executors toget...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65740/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15190 **[Test build #65752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65752/consoleFull)** for PR 15190 at commit [`f2b93de`](https://github.com/apache/spark/commit/f2b93de629f378ca99f8d3086ade8dc05b41a912). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15152: [SPARK-17365][Core] Remove/Kill multiple executors toget...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15152 **[Test build #65740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65740/consoleFull)** for PR 15152 at commit [`3d2fac4`](https://github.com/apache/spark/commit/3d2fac45f72dd56e03486bb269baa138cefe4e2e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15189: [SPARK-17549][sql] Coalesce cached relation stats in dri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15189 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15189: [SPARK-17549][sql] Coalesce cached relation stats in dri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15189 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65743/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] hive.default.fileformat=orc do...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79964807 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala --- @@ -556,4 +558,32 @@ class HiveDDLCommandSuite extends PlanTest { assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") == "2") } + test("Test default fileformat") { +withSQLConf("hive.default.fileformat" -> "orc") { + val s1 = +s""" + |CREATE TABLE IF NOT EXISTS fileformat_test (id int) +""".stripMargin + val (desc, exists) = extractTableDesc(s1) + assert(exists) + assert(desc.storage.inputFormat == Some("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat")) + assert(desc.storage.outputFormat == Some("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat")) + assert(desc.storage.serde == Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde")) +} + +withSQLConf("hive.default.fileformat" -> "parquet") { + val s1 = +s""" + |CREATE TABLE IF NOT EXISTS fileformat_test (id int) +""".stripMargin + val (desc, exists) = extractTableDesc(s1) --- End diff -- @gatorsmile Thanks !! I have updated as per your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15189: [SPARK-17549][sql] Coalesce cached relation stats in dri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15189 **[Test build #65743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65743/consoleFull)** for PR 15189 at commit [`5b3a65a`](https://github.com/apache/spark/commit/5b3a65a02210c696206546c43403867bcc9eb077). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ColStatsAccumulator(originalOutput: Seq[Attribute])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] hive.default.fileformat=orc do...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79964525 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala --- @@ -556,4 +558,32 @@ class HiveDDLCommandSuite extends PlanTest { assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") == "2") } + test("Test default fileformat") { +withSQLConf("hive.default.fileformat" -> "orc") { + val s1 = +s""" + |CREATE TABLE IF NOT EXISTS fileformat_test (id int) +""".stripMargin + val (desc, exists) = extractTableDesc(s1) + assert(exists) + assert(desc.storage.inputFormat == Some("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat")) + assert(desc.storage.outputFormat == Some("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat")) + assert(desc.storage.serde == Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde")) +} + +withSQLConf("hive.default.fileformat" -> "parquet") { + val s1 = +s""" + |CREATE TABLE IF NOT EXISTS fileformat_test (id int) +""".stripMargin + val (desc, exists) = extractTableDesc(s1) --- End diff -- The same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] hive.default.fileformat=orc do...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79964497 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala --- @@ -556,4 +558,32 @@ class HiveDDLCommandSuite extends PlanTest { assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") == "2") } + test("Test default fileformat") { +withSQLConf("hive.default.fileformat" -> "orc") { + val s1 = +s""" + |CREATE TABLE IF NOT EXISTS fileformat_test (id int) +""".stripMargin + val (desc, exists) = extractTableDesc(s1) --- End diff -- ```Scala val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS fileformat_test (id int)") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15154: [SPARK-17494] [SQL] changePrecision() on compact decimal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15154 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org