Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4622#issuecomment-99354269
General question. In euclidean space, the negative squared error is used as
similarity. If we want to use affinity propagation to clustering lots of
samples in euclidean space, it's impossible to create all the pairs of
similarity data even it's symmetrical. What's the criteria to filter out those
pairs which have very low similarity? Also, it's impossible to compute all the
pairs of `RDD[Vector]` since it's O(N^2) operation, and how people address this
in practice?
I really like this algorithm, but still have concern about how people can
use it in practice.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]