Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3656#issuecomment-71526176 Seeing some problems that this PR could address so reviving this thread. @lawlerd the configurable count would help because if it is known that the individual objects would be large, the sampling could be set to be done more frequently. So if sampling every 32 times is too passive then a more aggressive option can be configured, say sampling every 5 times.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org