sunchao commented on pull request #30019:
URL: https://github.com/apache/spark/pull/30019#issuecomment-713055884


   Thanks @steveloughran , for putting all the context on S3A/ABFS, and sorry 
for the late comment.
   
   > HEAD: don't call exists/getFileStatus/etc if you know a file is there. 
It's also better to do a try { open() } catch (f: FileNotFoundException) than 
it is for a probe + open, as youl will save a HEAD.
   
   Good point. I guess this applies in general to other FS-impls as well. I can 
spend some time checking Spark codebase for this pattern and find potential 
improvements.
   
   > The new openFile() API will let the caller specify seek policy 
(sequential, random, adaptive,...) and, if you pass in the known file length, 
skip the HEAD. It's also an async operation on S3A, even when a HEAD is needed.
   
   Cool. This is in 3.3.0+ though. We can perhaps explore this once Spark made 
the switch.
   
   > LIST ...
   
   Yes I wish Spark can benefit from the paged listing (I know Presto has 
[optimizations](https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/BackgroundHiveSplitLoader.java)
 around this and it seems to work really well for them) from `FileSystem` impls 
but it will need some significant changes. 
   
   > What you can do right now is add an option to log the toString() values of 
input streams/output streams/remoteiterators at debug level to some 
performance-only log
   
   Sounds good. Let me try to add these.
   
   > I'd love to get some comparisons here.
   
   I'll see what I can do ... without the incremental listing support I think 
we won't see much difference here? plus I don't have a testing environment for 
S3A at hand :(
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to