[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

staple Thu, 25 Sep 2014 11:50:38 -0700

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2412#issuecomment-56865408
  
    @davies It looks like in your #2378 you already disabled caching for 
NaiveBayes and DecisionTree. The only difference from this patch is that I 
disabled caching for ALS as well.
    
    We discussed this a bit here: 
https://github.com/apache/spark/pull/2378#discussion_r17686208. I filed this 
ticket as a follow up of the work on uncached input warnings 
(https://github.com/apache/spark/pull/2347). The warnings are only supposed to 
be printed if the input data is accessed repeatedly on many iterations during 
learning. That's not the case with ALS, so a warning shouldn't be printed 
there. But I can see there's a case for caching because the input data is 
accessed twice when constructing an intermediate representation of the data. I 
don't have a strong preference on whether we should or should not cache in 
python for the ALS learner.
    
    If you are fine with continuing to cache in python for ALS, then there's no 
more work to be done for this ticket, SPARK-3550.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3550][MLLIB] Disable automatic rdd cach...

Reply via email to