Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49104793
I didn't suggest having a new implementation for long IDs, only a new API.
They can run on the same implementation (e.g. the current Int-based one
transforms the Ints to Longs and calls that one). This is a much more sensible
way to evolve the API and it's very common in other software. All our MLlib
APIs were designed to support this kind of evolution (e.g. you set your
parameters using a builder pattern, where we can add new methods, and the
top-level API is just functions you can call that we can easily map to more
complex versions of the functions).
The place I'm coming from is that there are *far* more complex APIs than
ours that have retained backwards compatibility over decades, and were
maintained by a similar-sized team. One great example is Java's class library,
which is not only a great library but has also been compatible since 1.0. There
are well-known ways to retain compatibility while still improving the API, such
as adding a new package (e.g. java.nio vs java.io). I would be totally fine
doing that with MLlib as we gain experience with it, but there's no reason to
break the old API in the process. Again, I feel that people from today's tech
company world think way too much about "perfecting" an API by repeatedly
tweaking it, and while that works within a single engineering team, it doesn't
work in software that you expect someone else to use.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---