Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4420#issuecomment-74750269
  
    I wonder if maybe we could improve these heuristics instead of adding user 
flags (for many users it might be hard to figure out how to set these). The 
heuristic of skipping the first 1000 entries assumes that the entries are small 
- I actually don't see what we gain by skipping those entires, since we still 
perform the sampling during this time (the sampling is the only really 
expensive part of these heuristics). @andrewor14, do you remember why this is 
there?
    
    Also, maybe for the first 32 elements we could check every element, then 
fall back to checking every 32 elements. This would handle the case where there 
are some extremely large objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to