wangshisan commented on issue #25869: [SPARK-29189][SQL] Add an option to ignore block locations when listing file URL: https://github.com/apache/spark/pull/25869#issuecomment-533809214 Yes, I see. A new API call was introduced in #24175 . And it do improve a lot. While, the new API will still fetch all the block location informations, and in our benchmark, it may consume tens of seconds to fetch all of them for a huge table with the new API. In my opinion, if a Spark cluster is deployed totally physically separated from a HDFS cluster, we do not need any of such block location information. And this is what this PR for.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
