tiehexue opened a new pull request #32326:
URL: https://github.com/apache/spark/pull/32326


   There are three LocationStrategy: PreferBrokers, PreferConsistent, 
PreferFixed. I got a scenario that I need a random one. There are plenty of 
topic partitions that are varies from each other with different records inside. 
And I have a lot of executors. PreferBrokers does not help here. 
PreferConsistent will make things worse that some executor will always get 
heavy tasks. PreferFixed does not help too, because it is fixed, neither to say 
I have to create a mapping manually.
   
   A random LocationStrategy should dispatch a topic partition to different 
executors in different window. This would balance the load among spark 
executors.
   
   ### What changes were proposed in this pull request?
   I added a new method getExecutorHosts in SparkContext which provides host 
name list. And a PreferRandom case object, which has a random method that 
returns a "faked" map. That map's get method randomly return a host name.
   
   ### Does this PR introduce _any_ user-facing change?
   User will have another option that may be helpful.
   
   
   ### How was this patch tested?
   I constructed PreferFixed with RandomLocationStrategyMap inside, and 
verified with 1000+ topic partitions across against 2000+ executors. It worked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to