GitHub user shahidki31 opened a pull request:

    https://github.com/apache/spark/pull/21689

    Minor correction in the powerIterationSuite

    ## What changes were proposed in this pull request?
    
    Currently the power iteration clustering test in ml maps the results to the 
labels 0 and 1 for assertion. Since the clustering outputs need not be the same 
as the mapped labels, it may cause failure in the test case.
    Even if it correctly maps, theoretically we cannot guarantee which set 
belongs to which cluster label. KMeans can assign label 0 to either of the set. 
    
    PowerIterationClusteringSuite in the MLLib checks the clustering results 
without mapping to the particular cluster label, as shown below.
    ``  val predictions = Array.fill(2)(mutable.Set.empty[Long])
        model.assignments.collect().foreach { a =>
          predictions(a.cluster) += a.id
        }
        assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet))
    ``
    
    ## How was this patch tested?
    Existing tests
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shahidki31/spark picTestSuiteMinorCorrection

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21689
    
----
commit 7b52f1ebbd4b7afd088c41695c61f4475911271e
Author: Shahid <shahidki31@...>
Date:   2018-07-01T19:39:19Z

    Minor correction in the powerIterationSuite

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to