Github user rnowling commented on the pull request:
https://github.com/apache/spark/pull/1964#issuecomment-53570185
@mengxr , @yu-iskw
I think it is valuable to contribute distance metrics to Breeze, but not
all of the metrics provided by @yu-iskw may be of interest to Breeze. If MLLib
provides its own wrapper, we can call Breeze for what distance metrics are
available there and provide our own implementations for others.
There was interest on the mailing list in different distance metrics for
KMeans. I think this PR should be amenable towards a solution for that. My
main complaint is that the distance metrics implementedin this PR expect MLlib
Vectors, not Breeze vectors. Before this is committed, I think we should
figure out how to generalize these metrics to Breeze vectors -- maybe add
distance(breeze, breeze) functions to @yu-iskw 's implementation or make
breeze vectors the default type and provide an implicit way to cast MLlib
vectors to Breeze vectors?
Once native support for Breeze vectors is available, we can start work on a
high-level API to distance metrics for KMeans and provide an implementation
using the code in this PR. A string-based API may be one option but this would
not support distance metrics (e.g., weighted, L-n norms) which require
additional parameters.
What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]