Github user derrickburns commented on the pull request:

    https://github.com/apache/spark/pull/2634#issuecomment-70583162
  
    @mengxr
    
    One more thing regarding sparse vectors.  Sparse vectors can become dense
    under cluster creation, which, in turn, can cause the running time of the
    K-means clustering to skyrocket.
    
    To address this problem, one can project clusters onto a sparse vector
    before performing distance calculation.  My current version of the
    clusterer does this when the appropriate distance object is selected.
    
    On Sun, Jan 18, 2015 at 7:59 PM, Derrick Burns <[email protected]>
    wrote:
    
    > @mengxr
    >
    > I have implemented several variants of Kullback-Leibler divergence in my 
separate
    > GitHub repository
    > <https://github.com/derrickburns/generalized-kmeans-clustering>.  These
    > variants are more efficient that the standard KL-divergence which is
    > defined on R+ ^ n because they take advantage of extra knowledge of the
    > domain. I have used these variants with much success (i.e. much faster
    > running time) in my large scale clustering runs.
    >
    > On Sat, Jan 17, 2015 at 7:02 PM, UCB AMPLab <[email protected]>
    > wrote:
    >
    >> Test FAILed.
    >> Refer to this link for build results (access rights to CI server needed):
    >> 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25711/
    >> Test FAILed.
    >>
    >> —
    >> Reply to this email directly or view it on GitHub
    >> <https://github.com/apache/spark/pull/2634#issuecomment-70394598>.
    >>
    >
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to