Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/12896
Your suggestion is, to me, the ideal solution. It's probably the more
common method of splitting "ratings" datasets for CV purposes.
I'm interested in working on it but I think it would be a whole new
specific cross-validator class. I'm not quite sure what the best approach is
for efficiency (refer #14321 for stratified sampling approach, it's more for
labels and is not efficient for this case, but the general concept might
apply). In short, it's obviously a lot more effort and will take time. Perhaps
it also starts life outside of Spark in packages. Not sure on this yet, but
happy to collaborate on ideas!
Originally this PR was intended for `2.0` to at least make ALS useable with
the CV classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]