[GitHub] [spark] MaxGekk commented on a change in pull request #34259: [SPARK-36949][SQL] Disallow Hive provider tables with ANSI intervals

GitBox Wed, 13 Oct 2021 11:32:17 -0700


MaxGekk commented on a change in pull request #34259:
URL: https://github.com/apache/spark/pull/34259#discussion_r727321938




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -932,7 +933,9 @@ object DDLUtils extends Logging {
       _.toLowerCase(Locale.ROOT) match {
         case HIVE_PROVIDER =>
           val serde = table.storage.serde
-          if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+          if (schema.exists(_.dataType.isInstanceOf[AnsiIntervalType])) {
+            throw hiveTableWithAnsiIntervalsError(table.identifier.toString)
+          } else if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {

Review comment:
       I have only some concerns about the place of the check. This function is 
supposed to check column names not column types.

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       Could you give me an example, please.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -932,7 +933,9 @@ object DDLUtils extends Logging {
       _.toLowerCase(Locale.ROOT) match {
         case HIVE_PROVIDER =>
           val serde = table.storage.serde
-          if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+          if (schema.exists(_.dataType.isInstanceOf[AnsiIntervalType])) {
+            throw hiveTableWithAnsiIntervalsError(table.identifier.toString)
+          } else if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {

Review comment:
       I have released that I have some concerns about the place of the check. 
This function is supposed to check column names not column types.

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       @cloud-fan Could you give me an example, please. The PR added a test 
already https://github.com/apache/spark/pull/34215
   

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -932,7 +933,9 @@ object DDLUtils extends Logging {
       _.toLowerCase(Locale.ROOT) match {
         case HIVE_PROVIDER =>
           val serde = table.storage.serde
-          if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+          if (schema.exists(_.dataType.isInstanceOf[AnsiIntervalType])) {
+            throw hiveTableWithAnsiIntervalsError(table.identifier.toString)
+          } else if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {

Review comment:
       I think of new private function `checkDataColTypes` or 
`checkColumnTypes`, WDYT? And call the function from the same places as 
`checkDataColNames()` is called.

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       The issue is in Hive's SerDe/Metastore. So, when you insert ANSI 
intervals to a table where Hive SerDe is not involved, and we store schema in 
Spark's specific format to Hive external catalog, I wonder why do you wonder 
that INSERT work?

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       The issue is in Hive's SerDe/Metastore. So, when you insert ANSI 
intervals to a table where Hive SerDe is not involved, and we store schema in 
Spark's specific format to Hive external catalog, I wonder why do you wonder 
that INSERT works well?

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       The issue is in Hive's SerDe/Metastore. So, when you insert ANSI 
intervals to a table where Hive SerDe is not involved (`provider` is parquet, 
for instance), and we store schema in Spark's specific format to Hive external 
catalog, I wonder why do you wonder that INSERT works well?

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       The issue is in Hive's SerDe/Metastore. So, when you insert ANSI 
intervals to a table where Hive SerDe is not involved (`provider` is parquet, 
for instance), and we store schema in Spark's specific format to Hive external 
catalog, I wonder why do you wonder that INSERT works well?
   
   In that case, we use Hive MetaStore as a store for our schema only. HMS is 
not aware of our types, right?

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSuite.scala
##########
@@ -144,4 +144,21 @@ class HiveParquetSuite extends QueryTest with ParquetTest 
with TestHiveSingleton
             .plus(123456, ChronoUnit.MICROS)))
     }
   }
+
+  test("SPARK-36949: Disallow tables with ANSI intervals stored as parquet") {
+    val tbl = "tbl_with_ansi_intervals"
+    withTable(tbl) {
+      val errMsg = intercept[UnsupportedOperationException] {
+        sql(
+          s"""
+             |CREATE TABLE $tbl
+             |STORED AS PARQUET

Review comment:
       > we can put the test in SQLQuerySuite under sql/hive with parquet serde 
only.
   
   Why not to HiveDDLSuite, for instance? This PR is mostly about 
creating/modifying a table (data definition) but not about querying. 

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -932,7 +933,9 @@ object DDLUtils extends Logging {
       _.toLowerCase(Locale.ROOT) match {
         case HIVE_PROVIDER =>
           val serde = table.storage.serde
-          if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {
+          if (schema.exists(_.dataType.isInstanceOf[AnsiIntervalType])) {
+            throw hiveTableWithAnsiIntervalsError(table.identifier.toString)
+          } else if (serde == HiveSerDe.sourceToSerDe("orc").get.serde) {

Review comment:
       Renamed to the private methods to `checkTableColumns`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on a change in pull request #34259: [SPARK-36949][SQL] Disallow Hive provider tables with ANSI intervals

Reply via email to