steveloughran commented on PR #50765:
URL: https://github.com/apache/spark/pull/50765#issuecomment-3728531656

   hey, this should be re-opened!
   
   FileStatus serialization isn't something thought about much.
   
   some of the subclasses add other info. In particular, s3afs adds the etag 
(optionally version id) which is then used for if-changed change detection. 
ABFS has stuff too. On a large s3 file the etag is actually the concatenation 
of the md5 checksum of each part, so can get big.
   
   For the faster open of s3a, all that is needed is openFile() with the file 
length set as one of the builder options. I think abfs currently needs its 
subclass of filestatus.
   
   Could it just be the file length that is passed around? Is the full file 
status actually required?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to