Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49069526
No, MLlib is not experimental, only the parts annotated with @Exprimental
are. The reason is that we felt we could continue supporting these low-level
APIs indefinitely and add other ones later if we need to. Again, for real
users, API stability matters *much* more than you'd think -- there's nothing
more annoying than having to change your app to implement a software upgrade,
and it causes fragmentation of the userbase as users stick to an older version
instead of upgrading.
In this particular case, there are a few things we can do. We can think of
additions to the API here that preserve the old methods but add new versions of
predict. We can add a new class called LongALS or something like that, and have
these ones call it and get back a LongMatrixFactorizationModel. Or we can offer
a utility to generate unique IDs.
The reason I was asking about hash collisions above is that even with
64-bit IDs, you're not guaranteed to be collision-free. With 2-3 billion users
you actually have a good chance of a collision. So if applications care about
that, they may not be okay with this solution either.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---