Github user yu-iskw commented on the pull request:
https://github.com/apache/spark/pull/1964#issuecomment-53845417
@mengxr, @dlwh, @erikerlandson, @rnowling,
Thank you for your feedback.
I agree with that idea which distance metrics/measures are implemented in
Breeze. However, I am a bit worried about the interface of the distance
metrics in Spark.
I could implement sample distance metric for Breeze like below. However, I
couldn't thought of the interface for Spark.
https://gist.github.com/yu-iskw/37ae208c530f7018e048
I expect users can switch distance metric and use their own distance
function in some machine learning algorithms in MLlib, such as Kmeans. Because
some of them are applied to `RDD[Vector]` as its input data. I think that
users can make their own function metric, using Spark Vector instead of Breeze
Vector. For example, `(v1: Vector, v2: Vector) => Double`. At least, the
interface shouldn't depend on Breeze.
## High Level Use Case Image
```
KMeans().setMeasure(EuclideanDistance).run(RDD[Vector])
KMeans().setMeasure((v1: Vector, v2: Vector) => fun(v1,
v2)).run(RDD[Vector])
```
I'm sorry if I misunderstood your opinion. If you have any good idea, can
you please show me mock code for use case?
(Please be generous in finding my rude expressions,if any. I don't mean
such, because I'm not good at English.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]