Github user rnowling commented on the pull request:
https://github.com/apache/spark/pull/1964#issuecomment-52810571
Great work!
If a class implements the DistanceMetric trait, it should probably be
renamed ---DistanceMetric. For example, CosineDistanceMeasure should be
CosineDistanceMetric.
From a performance standpoint, I'm concerned about the vector conversions.
KMeans converts all vectors to BreezeVectors internally, so using these metrics
would require BreezeVectors -> Vectors -> BreezeVectors. CosineDistanceMeasure
actually converts the same vector (v1) twice instead of saving the conversion
for reuse!
Since the Breeze library should perform vector operations in a more
efficient manner, Breeze should be used in place of map where possible. E.g.,
WeightedEuclideanDistanceMeasure. Was map used there because Breeze doesn't
support element-wise operations?
I look forward to seeing this included in mllib!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]