steveloughran commented on issue #25899: [SPARK-29089][SQL] Parallelize 
blocking FileSystem calls in DataSource#checkAndGlobPathIfNecessary
URL: https://github.com/apache/spark/pull/25899#issuecomment-549465360
 
 
   Nice experiment!
   
   I guess in-EC2, you're limited by the number of course but also latency is 
nice and low. Remotely, latency is worse so if there is anything we can do in 
parallel threads -there are some tangible benefits.
   
   in both local and remote S3 interaction rename() is faked with a COPY, which 
is 6-10MB/s; that can be done via the thread pool too if you can configure the 
AWS SDK to split up a large copy into parallel parts. That shares the same 
pools, so its useful to have some capacity there on any process renaming things.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to