[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...

srowen Wed, 07 Sep 2016 11:42:06 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14956#discussion_r77880357
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala
 ---
    @@ -395,7 +395,7 @@ object PowerIterationClustering extends Logging {
         val points = v.mapValues(x => Vectors.dense(x)).cache()
         val model = new KMeans()
           .setK(k)
    -      .setSeed(0L)
    +      .setSeed(5L)
    --- End diff --
    
    The desired test result here depends on the seed, since some random 
initializations don't happen to produce the clustering that the test has in 
mind. 0 no longer worked after the change above but 5 did. This does indicate 
the clustering is different. Yes that sounds like a good quick science 
experiment, to verify more empirically that the clustering results here match 
what the paper advertises.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...

Reply via email to