GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/22990
[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column
names in Analyzer
## What changes were proposed in this pull request?
When the queries do not use the column names with the same case, users
might hit various errors. Below is a typical test failure they can hit.
```
Expected only partition pruning predicates:
ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15));
org.apache.spark.sql.AnalysisException: Expected only partition pruning
predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >=
2017-08-15));
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
```
## How was this patch tested?
Added two test cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark fix1283
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22990.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22990
----
commit 5e9f6f345b93d3370906c7b2d73ede15f4089c29
Author: gatorsmile <gatorsmile@...>
Date: 2018-11-09T05:27:37Z
fix
commit 17b725c79ad602df20c44cacb92e7c6abd84cdda
Author: gatorsmile <gatorsmile@...>
Date: 2018-11-09T05:33:58Z
fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]