[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

jkbradley Mon, 29 Dec 2014 10:57:29 -0800

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/3702#issuecomment-68286086
  
    @srowen Sorry for the delay!  I'm really starting to wonder about this 
JIRA, though.  The collect() should return one BinaryLabelCounter per 
partition.  I'd assume people would have enough memory to store at least a few 
million BinaryLabelCounter instances on the driver.  Does that mean they have 
more than a few million partitions?
    
    Sorry I didn't think about this earlier, and perhaps I'm just confusing 
myself now---let me know what you think.  Is there an issue to solve here?
    
    Previously, I'd have said: "With the update, this LGTM"
    
    Also, I did think of one use case which may change things: We've been 
talking about people using these methods to make plots.  Do you think people 
ever use them to choose thresholds?  If so, then people might want much 
finer-grained ROC curves than we've been thinking, and it might be worthwhile 
to do a fancy implementation which avoids binning.
    
    At any rate, apologies for so much back-and-forth.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

Reply via email to