[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r354689784 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala ## @@ -83,15 +83,18 @@ class HiveSerDeSuite extends HiveComparisonTest with PlanTest with BeforeAndAfte } test("Test the default fileformat for Hive-serde tables") { -withSQLConf("hive.default.fileformat" -> "orc") { - val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS fileformat_test (id int)") +withSQLConf("hive.default.fileformat" -> "orc", + SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") { + val (desc, exists) = extractTableDesc( +"CREATE TABLE IF NOT EXISTS fileformat_test (id int)") assert(exists) assert(desc.storage.inputFormat == Some("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat")) assert(desc.storage.outputFormat == Some("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat")) assert(desc.storage.serde == Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde")) } -withSQLConf("hive.default.fileformat" -> "parquet") { +withSQLConf("hive.default.fileformat" -> "parquet", + SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") { Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r354689734 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala ## @@ -83,15 +83,18 @@ class HiveSerDeSuite extends HiveComparisonTest with PlanTest with BeforeAndAfte } test("Test the default fileformat for Hive-serde tables") { -withSQLConf("hive.default.fileformat" -> "orc") { - val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS fileformat_test (id int)") +withSQLConf("hive.default.fileformat" -> "orc", + SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") { + val (desc, exists) = extractTableDesc( +"CREATE TABLE IF NOT EXISTS fileformat_test (id int)") Review comment: shall we just add `using hive`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r354689535 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala ## @@ -177,17 +178,24 @@ class HiveShowCreateTableSuite extends ShowCreateTableSuite with TestHiveSinglet } test("SPARK-24911: keep quotes for nested fields in hive") { -withTable("t1") { - val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)" - sql(createTable) - val shownDDL = sql(s"SHOW CREATE TABLE t1") -.head() -.getString(0) -.split("\n") -.head - assert(shownDDL == createTable) +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + withTable("t1") { +val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)" Review comment: shall we just add `using hive`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r354673906 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -48,6 +49,28 @@ class DDLParserSuite extends AnalysisTest { comparePlans(parsePlan(sql), expected, checkAnalysis = false) } + test("SPARK-30098: create table without provider should " + +"use default data source under non-legacy mode") { +withSQLConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "false") { Review comment: let's remove this `withSQLConf` to show that it's the default behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r354122671 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala ## @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton with SQLTestUtils { } test("duplicated metastore relations") { -val df = spark.sql("SELECT * FROM src") -logInfo(df.queryExecution.toString) -df.as('a).join(df.as('b), $"a.key" === $"b.key") +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + val df = spark.sql("SELECT * FROM src") Review comment: > how about setting legacy mode for the TestHiveContext Then all the tests in hive module do not apply the new change, which decrease the test coverage. I don't see them as "query". They are just data preparations and we should create hive tables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353750839 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala ## @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton with SQLTestUtils { } test("duplicated metastore relations") { -val df = spark.sql("SELECT * FROM src") -logInfo(df.queryExecution.toString) -df.as('a).join(df.as('b), $"a.key" === $"b.key") +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + val df = spark.sql("SELECT * FROM src") Review comment: I believe many changes can be reverted if we fix `TestHive`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353750238 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala ## @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton with SQLTestUtils { } test("duplicated metastore relations") { -val df = spark.sql("SELECT * FROM src") -logInfo(df.queryExecution.toString) -df.as('a).join(df.as('b), $"a.key" === $"b.key") +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + val df = spark.sql("SELECT * FROM src") Review comment: It's created in `TestHive`. `TestHive` should always create hive tables, can we update it to write `STORED AS`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353748634 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala ## @@ -322,40 +329,47 @@ class DataSourceWithHiveMetastoreCatalogSuite } test("SPARK-27592 set the partitioned bucketed data source table SerDe correctly") { -val provider = "parquet" -withTable("t") { - spark.sql( -s""" - |CREATE TABLE t - |USING $provider - |PARTITIONED BY (p) - |CLUSTERED BY (key) - |SORTED BY (value) - |INTO 2 BUCKETS - |AS SELECT key, value, cast(key % 3 as string) as p FROM src Review comment: is it because of `src` again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353748296 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala ## @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton with SQLTestUtils { } test("duplicated metastore relations") { -val df = spark.sql("SELECT * FROM src") -logInfo(df.queryExecution.toString) -df.as('a).join(df.as('b), $"a.key" === $"b.key") +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + val df = spark.sql("SELECT * FROM src") Review comment: where do we create `src`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353747215 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/ErrorPositionSuite.scala ## @@ -23,13 +23,18 @@ import org.scalatest.BeforeAndAfterEach import org.apache.spark.sql.{AnalysisException, QueryTest} import org.apache.spark.sql.catalyst.util.quietly -import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.hive.test.{TestHive, TestHiveSingleton} +import org.apache.spark.sql.internal.SQLConf class ErrorPositionSuite extends QueryTest with TestHiveSingleton with BeforeAndAfterEach { import spark.implicits._ + private val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled + + override protected def beforeEach(): Unit = { super.beforeEach() +TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) Review comment: This test suite seems to test SELECT queries mostly. Where do we create tables? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353745710 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ## @@ -59,6 +60,23 @@ object TestData { } class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { + + private val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled Review comment: nit: it's more robust to write `SQLConf.get.createHiveTableByDefaultEnabled` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353745710 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ## @@ -59,6 +60,23 @@ object TestData { } class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { + + private val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled Review comment: nit: it's more robust to write `SQLConf.get.createHiveTableByDefaultEnabled` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353744493 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala ## @@ -774,68 +777,60 @@ class DDLParserSuite extends AnalysisTest with SharedSparkSession { test("create table - basic") { val query = "CREATE TABLE my_table (id int, name string)" -val (desc, allowExisting) = extractTableDesc(query) -assert(!allowExisting) -assert(desc.identifier.database.isEmpty) -assert(desc.identifier.table == "my_table") -assert(desc.tableType == CatalogTableType.MANAGED) -assert(desc.schema == new StructType().add("id", "int").add("name", "string")) -assert(desc.partitionColumnNames.isEmpty) -assert(desc.bucketSpec.isEmpty) -assert(desc.viewText.isEmpty) -assert(desc.viewDefaultDatabase.isEmpty) -assert(desc.viewQueryColumnNames.isEmpty) -assert(desc.storage.locationUri.isEmpty) -assert(desc.storage.inputFormat == - Some("org.apache.hadoop.mapred.TextInputFormat")) -assert(desc.storage.outputFormat == - Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")) -assert(desc.storage.serde == Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) -assert(desc.storage.properties.isEmpty) -assert(desc.properties.isEmpty) -assert(desc.comment.isEmpty) +withCreateTableStatement(query) { state => + assert(state.tableName(0) == "my_table") + assert(state.tableSchema == new StructType().add("id", "int").add("name", "string")) + assert(state.partitioning.isEmpty) + assert(state.bucketSpec.isEmpty) + assert(state.properties.isEmpty) + assert(state.provider == conf.defaultDataSourceName) + assert(state.options.isEmpty) + assert(state.location.isEmpty) + assert(state.comment.isEmpty) + assert(!state.ifNotExists) +} } test("create table - with database name") { val query = "CREATE TABLE dbx.my_table (id int, name string)" -val (desc, _) = extractTableDesc(query) -assert(desc.identifier.database == Some("dbx")) -assert(desc.identifier.table == "my_table") +withCreateTableStatement(query) { state => + assert(state.tableName(0) == "dbx") + assert(state.tableName(1) == "my_table") +} } test("create table - temporary") { val query = "CREATE TEMPORARY TABLE tab1 (id int, name string)" val e = intercept[ParseException] { parser.parsePlan(query) } -assert(e.message.contains("CREATE TEMPORARY TABLE is not supported yet")) +assert(e.message.contains("CREATE TEMPORARY TABLE without a provider is not allowed.")) } test("create table - external") { val query = "CREATE EXTERNAL TABLE tab1 (id int, name string) LOCATION '/path/to/nowhere'" -val (desc, _) = extractTableDesc(query) -assert(desc.tableType == CatalogTableType.EXTERNAL) -assert(desc.storage.locationUri == Some(new URI("/path/to/nowhere"))) +val e = intercept[ParseException] { parser.parsePlan(query) } +assert(e.message.contains("Operation not allowed: CREATE EXTERNAL TABLE ...")) } test("create table - if not exists") { val query = "CREATE TABLE IF NOT EXISTS tab1 (id int, name string)" -val (_, allowExisting) = extractTableDesc(query) -assert(allowExisting) +withCreateTableStatement(query) { state => + assert(state.ifNotExists) +} } test("create table - comment") { val query = "CREATE TABLE my_table (id int, name string) COMMENT 'its hot as hell below'" -val (desc, _) = extractTableDesc(query) -assert(desc.comment == Some("its hot as hell below")) +withCreateTableStatement(query) { state => + assert(state.comment == Some("its hot as hell below")) +} } - test("create table - partitioned columns") { -val query = "CREATE TABLE my_table (id int, name string) PARTITIONED BY (month int)" -val (desc, _) = extractTableDesc(query) -assert(desc.schema == new StructType() - .add("id", "int") - .add("name", "string") - .add("month", "int")) -assert(desc.partitionColumnNames == Seq("month")) + test("create table - partitioned by transforms") { Review comment: we can keep the test name. The query is `PARTITIONED BY (id)`, so it's still partitioned columns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353741403 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1961,6 +1961,15 @@ object SQLConf { .booleanConf .createWithDefault(false) + val LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED = +buildConf("spark.sql.legacy.createHiveTableByDefault.enabled") + .internal() + .doc("When set to true, CREATE TABLE syntax without a provider will use hive " + +"instead of the value of spark.sql.sources.default.") Review comment: nit: do not hardcode `spark.sql.sources.default` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353213626 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/create_view.sql ## @@ -41,7 +41,7 @@ DROP TABLE emp; -- These views are left around mainly to exercise special cases in pg_dump. -- [SPARK-19842] Informational Referential Integrity Constraints Support in Spark -CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20)); +CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20)) STORED AS parquet; Review comment: Why do we test "Hive support is required to CREATE Hive TABLE" in `create_view.sql`? I believe it's a mistake, let's just update the answer file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353179186 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala ## @@ -177,17 +178,24 @@ class HiveShowCreateTableSuite extends ShowCreateTableSuite with TestHiveSinglet } test("SPARK-24911: keep quotes for nested fields in hive") { -withTable("t1") { - val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)" - sql(createTable) - val shownDDL = sql(s"SHOW CREATE TABLE t1") -.head() -.getString(0) -.split("\n") -.head - assert(shownDDL == createTable) +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + withTable("t1") { +val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)" +sql(createTable) +val shownDDL = sql(s"SHOW CREATE TABLE t1") + .head() + .getString(0) + .split("\n") + .head +assert(shownDDL == createTable) Review comment: This doesn't matter. SHOW CREATE TABLE needs to return a SQL that can run. It's OK to have some extra spaces in the SQL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353178060 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala ## @@ -34,6 +35,22 @@ import org.apache.spark.util.Utils class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { import hiveContext._ + private val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled + + override def beforeAll(): Unit = { +super.beforeAll() +TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) Review comment: do you know how cache can be broken if we don't create hive table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353177380 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ## @@ -876,20 +894,27 @@ class HiveThriftHttpServerSuite extends HiveThriftJdbcTest { override def mode: ServerMode.Value = ServerMode.http test("JDBC query execution") { -withJdbcStatement("test") { statement => - val queries = Seq( -"SET spark.sql.shuffle.partitions=3", -"CREATE TABLE test(key INT, val STRING)", -s"LOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE test", -"CACHE TABLE test") +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + withJdbcStatement("test") { statement => +val queries = Seq( + "SET spark.sql.shuffle.partitions=3", + "CREATE TABLE test(key INT, val STRING)", Review comment: can we simply add STORED AS here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353176791 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala ## @@ -773,7 +774,7 @@ class DDLParserSuite extends AnalysisTest with SharedSparkSession { } test("create table - basic") { -val query = "CREATE TABLE my_table (id int, name string)" +val query = "CREATE TABLE my_table (id int, name string) STORED AS textfile" Review comment: shall we update the test assertions? We do want to test the basic CREATE TABLE command and make sure the created table is a native parquet table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353175740 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/create_view.sql ## @@ -41,7 +41,7 @@ DROP TABLE emp; -- These views are left around mainly to exercise special cases in pg_dump. -- [SPARK-19842] Informational Referential Integrity Constraints Support in Spark -CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20)); +CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20)) STORED AS parquet; Review comment: do you know how the result becomes different? This test shouldn't rely on any features of hive table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353174981 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2578,6 +2587,9 @@ class SQLConf extends Serializable with Logging { def exponentLiteralAsDecimalEnabled: Boolean = getConf(SQLConf.LEGACY_EXPONENT_LITERAL_AS_DECIMAL_ENABLED) + def createHiveTableByDefaultEnabled: Boolean = +getConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED) Review comment: nit: we shouldn't create a short method if it's only called 1 or 2 times This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r353173922 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -352,6 +358,15 @@ tableProvider : USING multipartIdentifier ; +createTableClauses Review comment: how about ``` createTableClause : (OPTIONS options=tablePropertyList) | ... ; ``` Then we can reuse it in more places ``` createTableHeader ... createTableClause* #createTable createTableHeader ... (createTableClause | rowFormat | createFileFormat | ...)* . #createHiveTable replaceTableHeader ... createTableClause* ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r352566380 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1960,6 +1960,15 @@ object SQLConf { .booleanConf .createWithDefault(false) + val LEGACY_RESPECT_HIVE_DEFAULT_PROVIDER_ENABLED = +buildConf("spark.sql.legacy.respectHiveDefaultProvider.enabled") Review comment: how about `spark.sql.legacy.createHiveTableByDefault.enabled` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org