GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/22311
[SPARK-25305][SQL] Respect attribute name in CollapseProject and ColumnPruning ## What changes were proposed in this pull request? Currently in optimizer rule `CollapseProject`, the lower level project is collapsed into upper level, but the naming of alias in lower level is propagated in upper level. In `ColumnPruning`, `Project` is eliminated if its child's output attributes is `semanticEquals` to it, even the naming doesn't match. Let's see the follow example: ``` val location = "/tmp/t" val df = spark.range(10).toDF("id") df.write.format("parquet").saveAsTable("tbl") spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location $location") spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") println(spark.read.parquet(location).schema) spark.table("tbl2").show() ``` The output column name in schema will be `id` instead of `ID`, thus the last query shows nothing from `tbl2`. By enabling the debug message we can see that the output naming is changed from `ID` to `id`, and then the `outputColumns` in `InsertIntoHadoopFsRelationCommand` is changed in `RemoveRedundantAliases`.   With the fix proposed in this PR, the output naming `ID` won't be changed.  ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark fixEliminateProject Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22311.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22311 ---- commit f94fdf7fd74a75c777b5b38ce970e0742d00091c Author: Gengliang Wang <gengliang.wang@...> Date: 2018-09-01T15:17:04Z Fix ColumnPruning and CollapseProject on eliminating Project ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org