cxzl25 commented on a change in pull request #29316:
URL: https://github.com/apache/spark/pull/29316#discussion_r488744882
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
##########
@@ -866,6 +866,28 @@ class InsertSuite extends DataSourceTest with
SharedSparkSession {
}.getMessage
assert(message.contains("LOCAL is supported only with file: scheme"))
}
+
+ test("SPARK-32508 " +
+ "Disallow empty part col values in partition spec before static partition
writing") {
+ withTable("insertTable") {
+ sql(
+ """
+ |CREATE TABLE insertTable(i int, part1 string, part2 string) USING
PARQUET
+ |PARTITIONED BY (part1, part2)
+ """.stripMargin)
+ val msg = "Partition spec is invalid"
+ assert(intercept[AnalysisException] {
+ sql("INSERT INTO TABLE insertTable PARTITION(part1=1, part2='') SELECT
1")
+ }.getMessage.contains(msg))
+ assert(intercept[AnalysisException] {
+ sql("INSERT INTO TABLE insertTable PARTITION(part1='', part2) SELECT 1
,'' AS part2")
+ }.getMessage.contains(msg))
+
+ sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2='2')
SELECT 1")
+ sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2) SELECT 1
,'2' AS part2")
+ sql("INSERT INTO TABLE insertTable PARTITION(part1='1', part2) SELECT 1
,'' AS part2")
Review comment:
Generally speaking, it is meaningless for the partition value to be
empty, so the static partition value is not allowed to be empty.
Dynamic partition may be that the user does not know that the partition
field is null or empty, and finally wrote the `__HIVE_DEFAULT_PARTITION__`
partition.
`listPartitions`
```sql
spark-sql> show partitions inserttable ;
part1=1/part2=__HIVE_DEFAULT_PARTITION__
Time taken: 0.2 seconds, Fetched 1 row(s)
spark-sql> desc formatted inserttable partition(part1='1',part2='');
Error in query: Partition spec is invalid. The spec ([part1=1, part2=])
contains an empty partition column value;
spark-sql> desc formatted inserttable
partition(part1='1',part2='__HIVE_DEFAULT_PARTITION__');
col_name data_type comment
...
Time taken: 0.348 seconds, Fetched 27 row(s)
```
The partition value the user sees is `__HIVE_DEFAULT_PARTITION__`, so the
user will not specify the partition value empty to query the partition details.
`loadPartition`
Because in `DynamicPartitionDataWriter#partitionPathExpression`, the
partition value will be null or emtpy converted to
`__HIVE_DEFAULT_PARTITION__`, so it can be executed successfully without the
need to increase early conversion.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]