[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

derrickburns Wed, 21 Jan 2015 16:18:07 -0800

Github user derrickburns commented on the pull request:

    https://github.com/apache/spark/pull/4144#issuecomment-70948894
  
    @mengxr
    
    FYI, I'm about to work on the performance of clustering millions of sparse 
vectors of very high dimension particularly when using KL divergence, where 
smoothing is needed to deal with sparsity. 
    
    Sent from my iPhone
    
    > On Jan 21, 2015, at 3:37 PM, Xiangrui Meng <[email protected]> 
wrote:
    > 
    > @derrickburns This PR doesn't handle sparse centers. The dense one should 
work with feature dimension up to 10m, which may cover many cases already. We 
can solve that issue in a separate PR. Does the changes in this PR look good to 
you? (It seems that there is something wrong with Jenkins.)
    > 
    > Feel free to port the features and it would be great if you can help test 
the performance:)
    > 
    > â
    > Reply to this email directly or view it on GitHub.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3424][MLLIB] cache point distances duri...

Reply via email to