[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

srowen Tue, 16 Dec 2014 04:33:02 -0800

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/3702#issuecomment-67152810
  
    @jkbradley Yes let's do `numBins`, I'm changing it now. Yeah, say you have 
100 elements in 10 partitions, and want to sample down to 12. That means 
sampling about every 100/12 ~= 8th element. But the simplistic approach samples 
20 elements, since each of 10 partitions will squash 1-8 and 9-10 into 2 new 
elements. Ideally 9-10 belong with 1-6 of the next partition or something. But 
stitching that together seems like more trouble than it's worth, or am I being 
pessimistic/lazy? or maybe I misunderstand your idea of offsets into the 
partition.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4547 [MLLIB] [WIP] OOM when making bins ...

Reply via email to