Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/4622#issuecomment-99354269
  
    General question. In euclidean space, the negative squared error is used as 
similarity. If we want to use affinity propagation to clustering lots of 
samples in euclidean space, it's impossible to create all the pairs of 
similarity data even it's symmetrical. What's the criteria to filter out those 
pairs which have very low similarity? Also, it's impossible to compute all the 
pairs of `RDD[Vector]` since it's O(N^2) operation, and how people address this 
in practice? 
    
    I really like this algorithm, but still have concern about how people can 
use it in practice. 
    
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to