Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49072664
Yeah API stability is very important. I keep banging on about the flip-side
-- freezing an API that may still need to change. You get a different important
problem. I'm sure everyone gets that, and it's a judgment call and trade-off.
I will change the PR to preserve the existing methods and add new ones.
That's the thing we can consider and merge or not. I'm not offended if nobody
else is feeling this one. I can always fork/wrap this aspect to fit what I need
it do. (And I have other API suggestions I'd rather spend time on if anything.)
I wouldn't want to add the overhead of a separate set of implementations
just for 64 bit values. Users would have a hard time understanding the
difference and choosing.
3 billion people is a lot! It could happen, yes. Maybe not with people but
with, say, URLs. *If* collisions mattered much, then with many billions of
things, you can't use the ALS implementation as it stands, since _most_ IDs
would collide no matter how you map or hash. That's the best motivation I can
offer for this change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---