[GitHub] [spark] cloud-fan commented on a change in pull request #31993: [SPARK-34897][SQL] Support reconcile schemas based on index after nested column pruning

GitBox Sun, 18 Apr 2021 22:48:48 -0700


cloud-fan commented on a change in pull request #31993:
URL: https://github.com/apache/spark/pull/31993#discussion_r615556485




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
##########
@@ -40,7 +40,7 @@ import org.apache.spark.sql.types.{StructField, StructType}
 case class HadoopFsRelation(
     location: FileIndex,
     partitionSchema: StructType,
-    dataSchema: StructType,
+    dataSchema: StructType, // The top-level columns should not be pruned. 
Please see SPARK-34897.

Review comment:
       Can we put more details?
   ```
   // The top-level columns in `dataSchema` should match the actual physical 
file schema, otherwise
   // the ORC data source may not work with the by-ordinal mode.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #31993: [SPARK-34897][SQL] Support reconcile schemas based on index after nested column pruning

Reply via email to