steveloughran commented on pull request #34762: URL: https://github.com/apache/spark/pull/34762#issuecomment-1005776551
Only just seen this. How much throttling what you actually seeing, and, assuming the s3a client, have you set directory marker retention to keep? It is often the actual attempted delete of dir markers which trigger the problem ... Effectively it is a form of write amplification. I'm actually thinking that feature to go into s3a this year should be configurable rate limiting through the guava RateLimiter; I'm using this in the abfs committer to keep committer io below limits where throttling starts to cause problems with renames. This is all per process; things like random filenames are still going to be critical to spread load on s3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
