[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79979332 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- @viirya @cloud-fan Actually i am not sure, if the above comment is in sync with the code. When we had this comment, we used to have CreateTableAsSelectLogicalPlan to represent the CTAS case and we used to check for serde's presence to determine whether or not to convert it to a data source table like following. ``` SQL if (sessionState.convertCTAS && table.storage.serde.isEmpty) { // Do the conversion when spark.sql.hive.convertCTAS is true and the query // does not specify any storage format (file format and storage handler). if (table.identifier.database.isDefined) { throw new AnalysisException( "Cannot specify database name in a CTAS statement " + "when spark.sql.hive.convertCTAS is set to true.") } val mode = if (allowExisting) SaveMode.Ignore else SaveMode.ErrorIfExists CreateTableUsingAsSelect( TableIdentifier(desc.identifier.table), conf.defaultDataSourceName, temporary = false, Array.empty[String], bucketSpec = None, mode, options = Map.empty[String, String], child ) } else { val desc = if (table.storage.serde.isEmpty) { // add default serde table.withNewStorage( serde = Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) } else { table } ``` I think this code has changed and moved to SparkSqlParser ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79978495 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- The current checking conditions are based on [ctx.createFileFormat and ctx.rowFormat](https://github.com/dilipbiswal/spark/blob/f2b93de629f378ca99f8d3086ade8dc05b41a912/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L1051-L1052). Thus, I think this PR looks ok. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79978157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- The comment is not valid now. This was removed by the PR: https://github.com/apache/spark/pull/13386 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79977535 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- cc @yhuai to confirm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15190#discussion_r79976580 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")), outputFormat = defaultHiveSerde.flatMap(_.outputFormat) .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")), -// Note: Keep this unspecified because we use the presence of the serde to decide --- End diff -- I think this is kept as unspecified because it is intended to write the table with Hive write path. If we specify serde here, it will be converted to datasource table. Is it ok? cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org