Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17090
@MLnick Thanks for showing those comparison numbers. If your
implementation is faster, then I'm happy going with it. I do wonder if we
might hit scalability issues with RDDs which we would not hit with DataFrames,
so it'd be worth revisiting a DF-based implementation later on.
In terms of the API, my main worry about
https://github.com/apache/spark/pull/12574 is that I haven't seen a full design
of how ALS would be plugged into cross validator. I still don't see how CV
could handle ALS unless we specialized it for recommendation. It was this
uncertainty which made me comment on
https://issues.apache.org/jira/browse/SPARK-13857 to recommend we go ahead and
merge basic recommendAll methods, while continuing to figure out a good design
for tuning.
Feel free to push back, but I would really like to see a sketch of how ALS
could plug into tuning. I haven't spent the time to do a literature review on
how tuning is generally done for recommendation, especially on the best ways to
split the data into folds.
> further methods to support recommending for all users (or items) in an
input DF? like recommendForAllUsers(dataset: DataFrame, num: Int)
I do think this sounds useful, but I'm focused on feature parity w.r.t. the
RDD-based API right now. It'd be nice to add later, though that could be via
your proposed transform-based API.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]