[GitHub] [spark] MaxGekk opened a new pull request #31094: [SPARK-33591][SQL][3.1] Recognize `null` in partition spec values

GitBox Fri, 08 Jan 2021 07:18:52 -0800


MaxGekk opened a new pull request #31094:
URL: https://github.com/apache/spark/pull/31094



   ### What changes were proposed in this pull request?
   1. Recognize `null` while parsing partition specs, and put `null` instead of 
`"null"` as partition values.
   2. For V1 catalog: replace `null` by `__HIVE_DEFAULT_PARTITION__`.
   3. For V2 catalogs: pass `null` AS IS, and let catalog implementations to 
decide how to handle `null`s as partition values in spec.
   
   ### Why are the changes needed?
   Currently, `null` in partition specs is recognized as the `"null"` string 
which could lead to incorrect results, for example:
   ```sql
   spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
BY (p1);
   spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
   spark-sql> SELECT isnull(p1) FROM tbl5;
   false
   ```
   Even we inserted a row to the partition with the `null` value, **the 
resulted table doesn't contain `null`**.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. After the changes, the example above works as expected:
   ```sql
   spark-sql> SELECT isnull(p1) FROM tbl5;
   true
   ```
   
   ### How was this patch tested?
   By running the affected test suites `SQLQuerySuite`, 
`AlterTablePartitionV2SQLSuite` and `v1/ShowPartitionsSuite`.
   
   Authored-by: Max Gekk <[email protected]>
   Signed-off-by: Wenchen Fan <[email protected]>
   (cherry picked from commit 157b72ac9fa0057d5fd6d7ed52a6c4b22ebd1dfc)
   Signed-off-by: Max Gekk <[email protected]>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk opened a new pull request #31094: [SPARK-33591][SQL][3.1] Recognize `null` in partition spec values

Reply via email to