pan3793 commented on code in PR #50765:
URL: https://github.com/apache/spark/pull/50765#discussion_r2348014717


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala:
##########
@@ -47,26 +47,25 @@ import org.apache.spark.util.NextIterator
  * that need to be prepended to each row.
  *
  * @param partitionValues value of partition columns to be prepended to each 
row.
- * @param filePath URI of the file to read
  * @param start the beginning offset (in bytes) of the block.
  * @param length number of bytes to read.
- * @param modificationTime The modification time of the input file, in 
milliseconds.
- * @param fileSize The length of the input file (not the block), in bytes.
+ * @param fileStatus The FileStatus instance of the file to read.
  * @param otherConstantMetadataColumnValues The values of any additional 
constant metadata columns.
  */
 case class PartitionedFile(
     partitionValues: InternalRow,
-    filePath: SparkPath,
     start: Long,
     length: Long,
+    fileStatus: FileStatus,

Review Comment:
   @cloud-fan the change basically moves the RPC cost of executor => storage 
service, to driver => executors, in my env (HDFS with RBF), the latter is much 
cheaper than the former. I don't have cloud env, so I can't give numbers for 
object storage services like S3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to