Github user dorx commented on the pull request:
https://github.com/apache/spark/pull/916#issuecomment-44876657
@concretevitamin probably not faster on individual runs (in fact there's
slightly more computation/example). What this gains us is the ability to
guarantee that we get enough examples to meet the sample size with high
confidence on the first try (so we're not stuck in the while loop). I'd love to
see some kind of performance comparison for average run time over lots of runs
(the old version is probably going to be penalized by having to resample on
occasions).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---