GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/22990
[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer ## What changes were proposed in this pull request? When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit. ``` Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) ``` ## How was this patch tested? Added two test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark fix1283 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22990 ---- commit 5e9f6f345b93d3370906c7b2d73ede15f4089c29 Author: gatorsmile <gatorsmile@...> Date: 2018-11-09T05:27:37Z fix commit 17b725c79ad602df20c44cacb92e7c6abd84cdda Author: gatorsmile <gatorsmile@...> Date: 2018-11-09T05:33:58Z fix ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org