Github user staple commented on the pull request:
https://github.com/apache/spark/pull/2412#issuecomment-56865408
@davies It looks like in your #2378 you already disabled caching for
NaiveBayes and DecisionTree. The only difference from this patch is that I
disabled caching for ALS as well.
We discussed this a bit here:
https://github.com/apache/spark/pull/2378#discussion_r17686208. I filed this
ticket as a follow up of the work on uncached input warnings
(https://github.com/apache/spark/pull/2347). The warnings are only supposed to
be printed if the input data is accessed repeatedly on many iterations during
learning. That's not the case with ALS, so a warning shouldn't be printed
there. But I can see there's a case for caching because the input data is
accessed twice when constructing an intermediate representation of the data. I
don't have a strong preference on whether we should or should not cache in
python for the ALS learner.
If you are fine with continuing to cache in python for ALS, then there's no
more work to be done for this ticket, SPARK-3550.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]