[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

mateiz Tue, 15 Jul 2014 11:01:51 -0700

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1393#issuecomment-49069526
  
    No, MLlib is not experimental, only the parts annotated with @Exprimental 
are. The reason is that we felt we could continue supporting these low-level 
APIs indefinitely and add other ones later if we need to. Again, for real 
users, API stability matters *much* more than you'd think -- there's nothing 
more annoying than having to change your app to implement a software upgrade, 
and it causes fragmentation of the userbase as users stick to an older version 
instead of upgrading.
    
    In this particular case, there are a few things we can do. We can think of 
additions to the API here that preserve the old methods but add new versions of 
predict. We can add a new class called LongALS or something like that, and have 
these ones call it and get back a LongMatrixFactorizationModel. Or we can offer 
a utility to generate unique IDs.
    
    The reason I was asking about hash collisions above is that even with 
64-bit IDs, you're not guaranteed to be collision-free. With 2-3 billion users 
you actually have a good chance of a collision. So if applications care about 
that, they may not be okay with this solution either.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

Reply via email to