GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21689
Minor correction in the powerIterationSuite
## What changes were proposed in this pull request?
Currently the power iteration clustering test in ml maps the results to the
labels 0 and 1 for assertion. Since the clustering outputs need not be the same
as the mapped labels, it may cause failure in the test case.
Even if it correctly maps, theoretically we cannot guarantee which set
belongs to which cluster label. KMeans can assign label 0 to either of the set.
PowerIterationClusteringSuite in the MLLib checks the clustering results
without mapping to the particular cluster label, as shown below.
`` val predictions = Array.fill(2)(mutable.Set.empty[Long])
model.assignments.collect().foreach { a =>
predictions(a.cluster) += a.id
}
assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet))
``
## How was this patch tested?
Existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shahidki31/spark picTestSuiteMinorCorrection
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21689.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21689
----
commit 7b52f1ebbd4b7afd088c41695c61f4475911271e
Author: Shahid <shahidki31@...>
Date: 2018-07-01T19:39:19Z
Minor correction in the powerIterationSuite
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]