[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

dorx Mon, 02 Jun 2014 11:56:00 -0700

Github user dorx commented on the pull request:

    https://github.com/apache/spark/pull/916#issuecomment-44876657
  
    @concretevitamin probably not faster on individual runs (in fact there's 
slightly more computation/example). What this gains us is the ability to 
guarantee that we get enough examples to meet the sample size with high 
confidence on the first try (so we're not stuck in the while loop). I'd love to 
see some kind of performance comparison for average run time over lots of runs 
(the old version is probably going to be penalized by having to resample on 
occasions).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1939 Refactor takeSample method in RDD t...

Reply via email to