pan3793 commented on PR #50765: URL: https://github.com/apache/spark/pull/50765#issuecomment-4600739008
> these optimisations (passing in the filestatus and re-using the same stream for footer + data reads) are very useful for cloud connectors. @ahmarsuhail the latter is split into https://github.com/apache/spark/pull/52384 and landed in Spark 4.1.0, since I don't have experience with cloud storage services, I may not be able to evaluate the benefit. for the remaining part mentioned in https://github.com/apache/spark/pull/50765#discussion_r2357607758 > constructing FileStatus from the executor side directly this requires a broad testing over different storage backends, I'm not sure if a basic `FileStatus` with only file path and offset/length is sufficient for all storage -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
