ScrapCodes commented on issue #24585: [Spark-27664][SQL] Performance issue while listing large number of files on an object store. URL: https://github.com/apache/spark/pull/24585#issuecomment-501143164 It was my mistake to open the issue without preparing the performance comparison data, backing the claim. I just went by the theoretical understanding and warning message. Reason is, the processing of 100K objects on a remote store itself take a lot of time running in minutes, the difference of re-listing is only a few seconds, which is hard to distinguish since object store perform differently on each invocation. So, I am closing this, please forgive me for wasting everyone's time. Making it configurable might still be helpful, but it can be revisited when there is a stronger case to consider.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
