[GitHub] [spark] cxzl25 commented on a change in pull request #29316: [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing

GitBox Tue, 15 Sep 2020 08:09:45 -0700


cxzl25 commented on a change in pull request #29316:
URL: https://github.com/apache/spark/pull/29316#discussion_r488744882




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
##########
@@ -866,6 +866,28 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
     }.getMessage
     assert(message.contains("LOCAL is supported only with file: scheme"))
   }
+
+  test("SPARK-32508 " +
+    "Disallow empty part col values in partition spec before static partition 
writing") {
+    withTable("insertTable") {
+      sql(
+        """
+          |CREATE TABLE insertTable(i int, part1 string, part2 string) USING 
PARQUET
+          |PARTITIONED BY (part1, part2)
+            """.stripMargin)
+      val msg = "Partition spec is invalid"
+      assert(intercept[AnalysisException] {
+        sql("INSERT INTO TABLE insertTable PARTITION(part1=1, part2='') SELECT 
1")
+      }.getMessage.contains(msg))
+      assert(intercept[AnalysisException] {
+        sql("INSERT INTO TABLE insertTable PARTITION(part1='', part2) SELECT 1 
,'' AS part2")
+      }.getMessage.contains(msg))
+
+      sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2='2') 
SELECT 1")
+      sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2) SELECT 1 
,'2' AS part2")
+      sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2) SELECT 1 
,'' AS part2")

Review comment:
       Generally speaking, it is meaningless for the partition value to be 
empty, so the static partition value is not allowed to be empty.
   Dynamic partition may be that the user does not know that the partition 
field is null or empty, and finally wrote the `__HIVE_DEFAULT_PARTITION__` 
partition.
   
   `listPartitions`
   ```sql
   spark-sql> show partitions inserttable ;
   part1=1/part2=__HIVE_DEFAULT_PARTITION__
   Time taken: 0.2 seconds, Fetched 1 row(s)
   spark-sql> desc formatted inserttable partition(part1='1',part2='');
   Error in query: Partition spec is invalid. The spec ([part1=1, part2=]) 
contains an empty partition column value;
   spark-sql> desc formatted inserttable 
partition(part1='1',part2='__HIVE_DEFAULT_PARTITION__');
   col_name     data_type       comment
   ...
   Time taken: 0.348 seconds, Fetched 27 row(s)
   ```
   The partition value the user sees is `__HIVE_DEFAULT_PARTITION__`, so the 
user will not specify the partition value empty to query the partition details.
   
   `loadPartition`
   Because in `DynamicPartitionDataWriter#partitionPathExpression`, the 
partition value will be null or emtpy converted to 
`__HIVE_DEFAULT_PARTITION__`, so it can be executed successfully without the 
need to increase early conversion.
   
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cxzl25 commented on a change in pull request #29316: [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing

Reply via email to