[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-05 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r354689784
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala
 ##
 @@ -83,15 +83,18 @@ class HiveSerDeSuite extends HiveComparisonTest with 
PlanTest with BeforeAndAfte
   }
 
   test("Test the default fileformat for Hive-serde tables") {
-withSQLConf("hive.default.fileformat" -> "orc") {
-  val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS 
fileformat_test (id int)")
+withSQLConf("hive.default.fileformat" -> "orc",
+  SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") {
+  val (desc, exists) = extractTableDesc(
+"CREATE TABLE IF NOT EXISTS fileformat_test (id int)")
   assert(exists)
   assert(desc.storage.inputFormat == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"))
   assert(desc.storage.outputFormat == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"))
   assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
 }
 
-withSQLConf("hive.default.fileformat" -> "parquet") {
+withSQLConf("hive.default.fileformat" -> "parquet",
+  SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") {
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-05 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r354689734
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala
 ##
 @@ -83,15 +83,18 @@ class HiveSerDeSuite extends HiveComparisonTest with 
PlanTest with BeforeAndAfte
   }
 
   test("Test the default fileformat for Hive-serde tables") {
-withSQLConf("hive.default.fileformat" -> "orc") {
-  val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS 
fileformat_test (id int)")
+withSQLConf("hive.default.fileformat" -> "orc",
+  SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") {
+  val (desc, exists) = extractTableDesc(
+"CREATE TABLE IF NOT EXISTS fileformat_test (id int)")
 
 Review comment:
   shall we just add `using hive`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-05 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r354689535
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala
 ##
 @@ -177,17 +178,24 @@ class HiveShowCreateTableSuite extends 
ShowCreateTableSuite with TestHiveSinglet
   }
 
   test("SPARK-24911: keep quotes for nested fields in hive") {
-withTable("t1") {
-  val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)"
-  sql(createTable)
-  val shownDDL = sql(s"SHOW CREATE TABLE t1")
-.head()
-.getString(0)
-.split("\n")
-.head
-  assert(shownDDL == createTable)
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  TestHive.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, 
true)
+  withTable("t1") {
+val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)"
 
 Review comment:
   shall we just add `using hive`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-05 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r354673906
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
 ##
 @@ -48,6 +49,28 @@ class DDLParserSuite extends AnalysisTest {
 comparePlans(parsePlan(sql), expected, checkAnalysis = false)
   }
 
+  test("SPARK-30098: create table without provider should " +
+"use default data source under non-legacy mode") {
+withSQLConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> 
"false") {
 
 Review comment:
   let's remove this `withSQLConf` to show that it's the default behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r354122671
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 ##
 @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton 
with SQLTestUtils {
   }
 
   test("duplicated metastore relations") {
-val df = spark.sql("SELECT * FROM src")
-logInfo(df.queryExecution.toString)
-df.as('a).join(df.as('b), $"a.key" === $"b.key")
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  
TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true)
+  val df = spark.sql("SELECT * FROM src")
 
 Review comment:
   > how about setting legacy mode for the TestHiveContext
   
   Then all the tests in hive module do not apply the new change, which 
decrease the test coverage.
   
   I don't see them as "query". They are just data preparations and we should 
create hive tables.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353750839
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 ##
 @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton 
with SQLTestUtils {
   }
 
   test("duplicated metastore relations") {
-val df = spark.sql("SELECT * FROM src")
-logInfo(df.queryExecution.toString)
-df.as('a).join(df.as('b), $"a.key" === $"b.key")
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  
TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true)
+  val df = spark.sql("SELECT * FROM src")
 
 Review comment:
   I believe many changes can be reverted if we fix `TestHive`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353750238
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 ##
 @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton 
with SQLTestUtils {
   }
 
   test("duplicated metastore relations") {
-val df = spark.sql("SELECT * FROM src")
-logInfo(df.queryExecution.toString)
-df.as('a).join(df.as('b), $"a.key" === $"b.key")
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  
TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true)
+  val df = spark.sql("SELECT * FROM src")
 
 Review comment:
   It's created in `TestHive`. `TestHive` should always create hive tables, can 
we update it to write `STORED AS`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353748634
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 ##
 @@ -322,40 +329,47 @@ class DataSourceWithHiveMetastoreCatalogSuite
   }
 
   test("SPARK-27592 set the partitioned bucketed data source table SerDe 
correctly") {
-val provider = "parquet"
-withTable("t") {
-  spark.sql(
-s"""
-   |CREATE TABLE t
-   |USING $provider
-   |PARTITIONED BY (p)
-   |CLUSTERED BY (key)
-   |SORTED BY (value)
-   |INTO 2 BUCKETS
-   |AS SELECT key, value, cast(key % 3 as string) as p FROM src
 
 Review comment:
   is it because of `src` again?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353748296
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 ##
 @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton 
with SQLTestUtils {
   }
 
   test("duplicated metastore relations") {
-val df = spark.sql("SELECT * FROM src")
-logInfo(df.queryExecution.toString)
-df.as('a).join(df.as('b), $"a.key" === $"b.key")
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  
TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true)
+  val df = spark.sql("SELECT * FROM src")
 
 Review comment:
   where do we create `src`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353747215
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ErrorPositionSuite.scala
 ##
 @@ -23,13 +23,18 @@ import org.scalatest.BeforeAndAfterEach
 
 import org.apache.spark.sql.{AnalysisException, QueryTest}
 import org.apache.spark.sql.catalyst.util.quietly
-import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.hive.test.{TestHive, TestHiveSingleton}
+import org.apache.spark.sql.internal.SQLConf
 
 class ErrorPositionSuite extends QueryTest with TestHiveSingleton with 
BeforeAndAfterEach {
   import spark.implicits._
 
+  private  val originalCreateHiveTable = 
TestHive.conf.createHiveTableByDefaultEnabled
+
+
   override protected def beforeEach(): Unit = {
 super.beforeEach()
+TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, 
true)
 
 Review comment:
   This test suite seems to test SELECT queries mostly. Where do we create 
tables?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353745710
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -59,6 +60,23 @@ object TestData {
 }
 
 class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
+
+  private val originalCreateHiveTable = 
TestHive.conf.createHiveTableByDefaultEnabled
 
 Review comment:
   nit: it's more robust to write `SQLConf.get.createHiveTableByDefaultEnabled`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353745710
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -59,6 +60,23 @@ object TestData {
 }
 
 class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
+
+  private val originalCreateHiveTable = 
TestHive.conf.createHiveTableByDefaultEnabled
 
 Review comment:
   nit: it's more robust to write `SQLConf.get.createHiveTableByDefaultEnabled`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353744493
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala
 ##
 @@ -774,68 +777,60 @@ class DDLParserSuite extends AnalysisTest with 
SharedSparkSession {
 
   test("create table - basic") {
 val query = "CREATE TABLE my_table (id int, name string)"
-val (desc, allowExisting) = extractTableDesc(query)
-assert(!allowExisting)
-assert(desc.identifier.database.isEmpty)
-assert(desc.identifier.table == "my_table")
-assert(desc.tableType == CatalogTableType.MANAGED)
-assert(desc.schema == new StructType().add("id", "int").add("name", 
"string"))
-assert(desc.partitionColumnNames.isEmpty)
-assert(desc.bucketSpec.isEmpty)
-assert(desc.viewText.isEmpty)
-assert(desc.viewDefaultDatabase.isEmpty)
-assert(desc.viewQueryColumnNames.isEmpty)
-assert(desc.storage.locationUri.isEmpty)
-assert(desc.storage.inputFormat ==
-  Some("org.apache.hadoop.mapred.TextInputFormat"))
-assert(desc.storage.outputFormat ==
-  Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"))
-assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
-assert(desc.storage.properties.isEmpty)
-assert(desc.properties.isEmpty)
-assert(desc.comment.isEmpty)
+withCreateTableStatement(query) { state =>
+  assert(state.tableName(0) == "my_table")
+  assert(state.tableSchema == new StructType().add("id", 
"int").add("name", "string"))
+  assert(state.partitioning.isEmpty)
+  assert(state.bucketSpec.isEmpty)
+  assert(state.properties.isEmpty)
+  assert(state.provider == conf.defaultDataSourceName)
+  assert(state.options.isEmpty)
+  assert(state.location.isEmpty)
+  assert(state.comment.isEmpty)
+  assert(!state.ifNotExists)
+}
   }
 
   test("create table - with database name") {
 val query = "CREATE TABLE dbx.my_table (id int, name string)"
-val (desc, _) = extractTableDesc(query)
-assert(desc.identifier.database == Some("dbx"))
-assert(desc.identifier.table == "my_table")
+withCreateTableStatement(query) { state =>
+  assert(state.tableName(0) == "dbx")
+  assert(state.tableName(1) == "my_table")
+}
   }
 
   test("create table - temporary") {
 val query = "CREATE TEMPORARY TABLE tab1 (id int, name string)"
 val e = intercept[ParseException] { parser.parsePlan(query) }
-assert(e.message.contains("CREATE TEMPORARY TABLE is not supported yet"))
+assert(e.message.contains("CREATE TEMPORARY TABLE without a provider is 
not allowed."))
   }
 
   test("create table - external") {
 val query = "CREATE EXTERNAL TABLE tab1 (id int, name string) LOCATION 
'/path/to/nowhere'"
-val (desc, _) = extractTableDesc(query)
-assert(desc.tableType == CatalogTableType.EXTERNAL)
-assert(desc.storage.locationUri == Some(new URI("/path/to/nowhere")))
+val e = intercept[ParseException] { parser.parsePlan(query) }
+assert(e.message.contains("Operation not allowed: CREATE EXTERNAL TABLE 
..."))
   }
 
   test("create table - if not exists") {
 val query = "CREATE TABLE IF NOT EXISTS tab1 (id int, name string)"
-val (_, allowExisting) = extractTableDesc(query)
-assert(allowExisting)
+withCreateTableStatement(query) { state =>
+  assert(state.ifNotExists)
+}
   }
 
   test("create table - comment") {
 val query = "CREATE TABLE my_table (id int, name string) COMMENT 'its hot 
as hell below'"
-val (desc, _) = extractTableDesc(query)
-assert(desc.comment == Some("its hot as hell below"))
+withCreateTableStatement(query) { state =>
+  assert(state.comment == Some("its hot as hell below"))
+}
   }
 
-  test("create table - partitioned columns") {
-val query = "CREATE TABLE my_table (id int, name string) PARTITIONED BY 
(month int)"
-val (desc, _) = extractTableDesc(query)
-assert(desc.schema == new StructType()
-  .add("id", "int")
-  .add("name", "string")
-  .add("month", "int"))
-assert(desc.partitionColumnNames == Seq("month"))
+  test("create table - partitioned by transforms") {
 
 Review comment:
   we can keep the test name. The query is `PARTITIONED BY (id)`, so it's still 
partitioned columns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, 

[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-04 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353741403
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1961,6 +1961,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED =
+buildConf("spark.sql.legacy.createHiveTableByDefault.enabled")
+  .internal()
+  .doc("When set to true, CREATE TABLE syntax without a provider will use 
hive " +
+"instead of the value of spark.sql.sources.default.")
 
 Review comment:
   nit: do not hardcode `spark.sql.sources.default`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353213626
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/postgreSQL/create_view.sql
 ##
 @@ -41,7 +41,7 @@ DROP TABLE emp;
 -- These views are left around mainly to exercise special cases in pg_dump.
 
 -- [SPARK-19842] Informational Referential Integrity Constraints Support in 
Spark
-CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20));
+CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20)) 
STORED AS parquet;
 
 Review comment:
   Why do we test "Hive support is required to CREATE Hive TABLE" in 
`create_view.sql`? I believe it's a mistake, let's just update the answer file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353179186
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala
 ##
 @@ -177,17 +178,24 @@ class HiveShowCreateTableSuite extends 
ShowCreateTableSuite with TestHiveSinglet
   }
 
   test("SPARK-24911: keep quotes for nested fields in hive") {
-withTable("t1") {
-  val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)"
-  sql(createTable)
-  val shownDDL = sql(s"SHOW CREATE TABLE t1")
-.head()
-.getString(0)
-.split("\n")
-.head
-  assert(shownDDL == createTable)
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  
TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true)
+  withTable("t1") {
+val createTable = "CREATE TABLE `t1`(`a` STRUCT<`b`: STRING>)"
+sql(createTable)
+val shownDDL = sql(s"SHOW CREATE TABLE t1")
+  .head()
+  .getString(0)
+  .split("\n")
+  .head
+assert(shownDDL == createTable)
 
 Review comment:
   This doesn't matter. SHOW CREATE TABLE needs to return a SQL that can run. 
It's OK to have some extra spaces in the SQL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353178060
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala
 ##
 @@ -34,6 +35,22 @@ import org.apache.spark.util.Utils
 class CachedTableSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
   import hiveContext._
 
+  private val originalCreateHiveTable = 
TestHive.conf.createHiveTableByDefaultEnabled
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, 
true)
 
 Review comment:
   do you know how cache can be broken if we don't create hive table?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353177380
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -876,20 +894,27 @@ class HiveThriftHttpServerSuite extends 
HiveThriftJdbcTest {
   override def mode: ServerMode.Value = ServerMode.http
 
   test("JDBC query execution") {
-withJdbcStatement("test") { statement =>
-  val queries = Seq(
-"SET spark.sql.shuffle.partitions=3",
-"CREATE TABLE test(key INT, val STRING)",
-s"LOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE 
test",
-"CACHE TABLE test")
+val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled
+try {
+  
TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true)
+  withJdbcStatement("test") { statement =>
+val queries = Seq(
+  "SET spark.sql.shuffle.partitions=3",
+  "CREATE TABLE test(key INT, val STRING)",
 
 Review comment:
   can we simply add STORED AS here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353176791
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala
 ##
 @@ -773,7 +774,7 @@ class DDLParserSuite extends AnalysisTest with 
SharedSparkSession {
   }
 
   test("create table - basic") {
-val query = "CREATE TABLE my_table (id int, name string)"
+val query = "CREATE TABLE my_table (id int, name string) STORED AS 
textfile"
 
 Review comment:
   shall we update the test assertions? We do want to test the basic CREATE 
TABLE command and make sure the created table is a native parquet table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353175740
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/postgreSQL/create_view.sql
 ##
 @@ -41,7 +41,7 @@ DROP TABLE emp;
 -- These views are left around mainly to exercise special cases in pg_dump.
 
 -- [SPARK-19842] Informational Referential Integrity Constraints Support in 
Spark
-CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20));
+CREATE TABLE view_base_table (key int /* PRIMARY KEY */, data varchar(20)) 
STORED AS parquet;
 
 Review comment:
   do you know how the result becomes different? This test shouldn't rely on 
any features of hive table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353174981
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -2578,6 +2587,9 @@ class SQLConf extends Serializable with Logging {
   def exponentLiteralAsDecimalEnabled: Boolean =
 getConf(SQLConf.LEGACY_EXPONENT_LITERAL_AS_DECIMAL_ENABLED)
 
+  def createHiveTableByDefaultEnabled: Boolean =
+getConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED)
 
 Review comment:
   nit: we shouldn't create a short method if it's only called 1 or 2 times


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-03 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r353173922
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -352,6 +358,15 @@ tableProvider
 : USING multipartIdentifier
 ;
 
+createTableClauses
 
 Review comment:
   how about
   ```
   createTableClause
 : (OPTIONS options=tablePropertyList) | ...
 ;
   ```
   Then we can reuse it in more places
   ```
   createTableHeader ... createTableClause*  
#createTable
   createTableHeader ... (createTableClause | rowFormat | createFileFormat | 
...)* . #createHiveTable
   replaceTableHeader ... createTableClause*
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax

2019-12-02 Thread GitBox
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use 
default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/26736#discussion_r352566380
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1960,6 +1960,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_RESPECT_HIVE_DEFAULT_PROVIDER_ENABLED =
+buildConf("spark.sql.legacy.respectHiveDefaultProvider.enabled")
 
 Review comment:
   how about `spark.sql.legacy.createHiveTableByDefault.enabled`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org