Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/1697#issuecomment-50930229
  
    I ran some microbenchmarks as outlined at 
https://gist.github.com/shivaram/63620c47f0ad50106e0a
    The comments below the gist have some numbers that I got on my laptop.
    
    Overall I think we should just use a upper bound on the number of map tasks 
and not return any preferred locations if we have more than say 1000 map tasks. 
There might be some more optimization we can do in terms of filtering out zeros 
etc. but a simple heuristic might be a good and safe start for now.
    
    @rxin Thoughts ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to