GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/21883
[SPARK-24937][SQL] Datasource partition table should load empty partitions
## What changes were proposed in this pull request?
How to reproduce:
```sql
spark-sql> CREATE TABLE tbl AS SELECT 1;
spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
> USING parquet
> PARTITIONED BY (day, hour);
spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01')
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl1;
spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
> PARTITIONED BY (day STRING, hour STRING);
spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01')
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl2;
day=2018-07-25/hour=01
spark-sql>
```
1. Users will be confused about whether the partition data of `tbl1` is
generated.
2. Inconsistent with Hive table behavior.
This pr fix this issues.
## How was this patch tested?
unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-24937
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21883.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21883
----
commit 90f99fc6e645a65232cc52c81717402df1fd97df
Author: Yuming Wang <yumwang@...>
Date: 2018-07-26T15:24:47Z
Datasource partition table should load empty partitions.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]