Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22124
The root project should be consistent with the schema of the target table.
But now it is inconsistent.
**Before this PR**:
[dataColumns](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L84):
`col1#8L,col2#9L`
[plan](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L67):
```
*(1) Project [col1#8L, col2#9L]
+- *(1) Filter (isnotnull(col1#8L) && (col1#8L > -20))
+- *(1) FileScan parquet default.table1[col1#8L,col2#9L] Batched: true,
Format: Parquet, Location: InMemoryFileIndex[file:/tmp/yumwang/spark/parquet],
PartitionFilters: [], PushedFilters: [IsNotNull(col1), GreaterThan(col1,-20)],
ReadSchema: struct<col1:bigint,col2:bigint>
```
**After this PR**:
[dataColumns](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L84):
`COL1#14L,COL2#15L`
[plan](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L67):
```
*(1) Project [col1#8L AS COL1#14L, col2#9L AS COL2#15L]
+- *(1) Filter (isnotnull(col1#8L) && (col1#8L > -20))
+- *(1) FileScan parquet default.table1[col1#8L,col2#9L] Batched: true,
Format: Parquet, Location: InMemoryFileIndex[file:/tmp/yumwang/spark/parquet],
PartitionFilters: [], PushedFilters: [IsNotNull(col1), GreaterThan(col1,-20)],
ReadSchema: struct<col1:bigint,col2:bigint>
```
Before [SPARK-22834](https://issues.apache.org/jira/browse/SPARK-22834)
[dataColumns](https://github.com/apache/spark/blob/ec122209fb35a65637df42eded64b0203e105aae/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L124):
`COL1#19L,COL2#20L`
[queryExecution](https://github.com/apache/spark/blob/ec122209fb35a65637df42eded64b0203e105aae/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L104):
```
== Parsed Logical Plan ==
Project [COL1#19L, COL2#20L]
+- SubqueryAlias view1
+- View (`default`.`view1`, [col1#19L,col2#20L])
+- Project [col1#15L, col2#16L]
+- Filter (col1#15L > cast(-20 as bigint))
+- SubqueryAlias table1
+- Relation[col1#15L,col2#16L] parquet
== Analyzed Logical Plan ==
COL1: bigint, COL2: bigint
Project [COL1#19L, COL2#20L]
+- SubqueryAlias view1
+- View (`default`.`view1`, [col1#19L,col2#20L])
+- Project [cast(col1#15L as bigint) AS col1#19L, cast(col2#16L as
bigint) AS col2#20L]
+- Project [col1#15L, col2#16L]
+- Filter (col1#15L > cast(-20 as bigint))
+- SubqueryAlias table1
+- Relation[col1#15L,col2#16L] parquet
== Optimized Logical Plan ==
Filter (isnotnull(col1#15L) && (col1#15L > -20))
+- Relation[col1#15L,col2#16L] parquet
== Physical Plan ==
*Project [col1#15L, col2#16L]
+- *Filter (isnotnull(col1#15L) && (col1#15L > -20))
+- *FileScan parquet default.table1[col1#15L,col2#16L] Batched: true,
Format: Parquet, Location: InMemoryFileIndex[file:/tmp/yumwang/spark/parquet],
PartitionFilters: [], PushedFilters: [IsNotNull(col1), GreaterThan(col1,-20)],
ReadSchema: struct<col1:bigint,col2:bigint>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]