prakharjain09 commented on a change in pull request #34575:
URL: https://github.com/apache/spark/pull/34575#discussion_r758897464
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
##########
@@ -57,11 +66,15 @@ case class PartitionedFile(
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
- @transient val filePartitions: Seq[FilePartition])
+ @transient val filePartitions: Seq[FilePartition],
+ val requiredSchema: StructType = StructType(Seq.empty),
Review comment:
Also why is the default value for `requiredSchema` passed as empty? I
understand that this is used only when `metadataStructCol` is passed but this
dependency is not very clear.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##########
@@ -194,10 +195,17 @@ case class FileSourceScanExec(
disableBucketedScan: Boolean = false)
extends DataSourceScanExec {
+ lazy val metadataStructCol: Option[AttributeReference] =
Review comment:
Are we assuming that only 1 column in `output` corresponds to
`MetadataAttribute(_)`? Is there some place in code where we are enforcing this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]