MaxGekk opened a new pull request #31326:
URL: https://github.com/apache/spark/pull/31326
### What changes were proposed in this pull request?
In the PR, I propose to convert `null` partition values to
`"__HIVE_DEFAULT_PARTITION__"` before storing in the `In-Memory` catalog
internally. Currently, the `In-Memory` catalog maintains null partitions as
`"__HIVE_DEFAULT_PARTITION__"` in file system but as `null` values in memory
that could cause some issues like in SPARK-34203.
### Why are the changes needed?
`InMemoryCatalog` stores partitions in the file system in the Hive
compatible form, for instance, it converts the `null` partition value to
`"__HIVE_DEFAULT_PARTITION__"` but at the same time it keeps null as is
internally. That causes an issue demonstrated by the example below:
```
$ ./bin/spark-shell -c spark.sql.catalogImplementation=in-memory
```
```scala
scala> spark.conf.get("spark.sql.catalogImplementation")
res0: String = in-memory
scala> sql("CREATE TABLE tbl (col1 INT, p1 STRING) USING parquet PARTITIONED
BY (p1)")
res1: org.apache.spark.sql.DataFrame = []
scala> sql("INSERT OVERWRITE TABLE tbl VALUES (0, null)")
res2: org.apache.spark.sql.DataFrame = []
scala> sql("ALTER TABLE tbl DROP PARTITION (p1 = null)")
org.apache.spark.sql.catalyst.analysis.NoSuchPartitionsException: The
following partitions not found in table 'tbl' database 'default':
Map(p1 -> null)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.dropPartitions(InMemoryCatalog.scala:440)
```
### Does this PR introduce _any_ user-facing change?
Yes. After the changes, `ALTER TABLE .. DROP PARTITION` can drop the `null`
partition in `In-Memory` catalog:
```scala
scala> spark.table("tbl").show(false)
+----+----+
|col1|p1 |
+----+----+
|0 |null|
+----+----+
scala> sql("ALTER TABLE tbl DROP PARTITION (p1 = null)")
res4: org.apache.spark.sql.DataFrame = []
scala> spark.table("tbl").show(false)
+----+---+
|col1|p1 |
+----+---+
+----+---+
```
### How was this patch tested?
Added new test to `DDLSuite`:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"
```
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit bfc023501379d28ae2db8708928f4e658ccaa07f)
Signed-off-by: Max Gekk <[email protected]>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]