[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...

srowen Sat, 10 Sep 2016 01:11:56 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14956#discussion_r78273585
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala
 ---
    @@ -395,7 +395,7 @@ object PowerIterationClustering extends Logging {
         val points = v.mapValues(x => Vectors.dense(x)).cache()
         val model = new KMeans()
           .setK(k)
    -      .setSeed(0L)
    +      .setSeed(5L)
    --- End diff --
    
    Power iteration clustering uses k-means as an internal step, which is why 
the change affects PIC and its test. You're right, really a test should assert 
something that is true no matter how the implementation behaves. It'll probably 
never work for 100% of seeds though I agree, it should work for most. It worked 
for, I think, the third seed I tried. Maybe those odds are too low. Let me look 
into what goes wrong with seed 0 and see if it implies that something should be 
improved about the test or impl.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...

Reply via email to