[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

srowen Wed, 17 Dec 2014 15:39:52 -0800

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/3702#issuecomment-67417721
  
    Hm I might be missing you point but if just taking every nth point, then 
the number of points taken from each partition will be correct to +/- 1 
already. You get sample a bit too close together at each partition boundary. 
Oversampling might help you space that out a little bit, but does it matter 
much? In a partition of 101 elements, taking every 5th, I will take 1, 5, ... , 
95, 100, and then start with 102 in the next partition. Taking 99 instead of 
100 is marginally better.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

Reply via email to