tiehexue opened a new pull request #32326: URL: https://github.com/apache/spark/pull/32326
There are three LocationStrategy: PreferBrokers, PreferConsistent, PreferFixed. I got a scenario that I need a random one. There are plenty of topic partitions that are varies from each other with different records inside. And I have a lot of executors. PreferBrokers does not help here. PreferConsistent will make things worse that some executor will always get heavy tasks. PreferFixed does not help too, because it is fixed, neither to say I have to create a mapping manually. A random LocationStrategy should dispatch a topic partition to different executors in different window. This would balance the load among spark executors. ### What changes were proposed in this pull request? I added a new method getExecutorHosts in SparkContext which provides host name list. And a PreferRandom case object, which has a random method that returns a "faked" map. That map's get method randomly return a host name. ### Does this PR introduce _any_ user-facing change? User will have another option that may be helpful. ### How was this patch tested? I constructed PreferFixed with RandomLocationStrategyMap inside, and verified with 1000+ topic partitions across against 2000+ executors. It worked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
