Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/10806#discussion_r54820556
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -224,142 +261,133 @@ class KMeans private (
/**
* Implementation of K-Means algorithm.
*/
- private def runAlgorithm(data: RDD[VectorWithNorm]): KMeansModel = {
+ private def runAlgorithm(
+ data: RDD[(DenseMatrix, DenseMatrix)],
+ centers: Array[VectorWithNorm]): KMeansModel = {
val sc = data.sparkContext
- val initStartTime = System.nanoTime()
-
- // Only one run is allowed when initialModel is given
- val numRuns = if (initialModel.nonEmpty) {
- if (runs > 1) logWarning("Ignoring runs; one run is allowed when
initialModel is given.")
- 1
- } else {
- runs
- }
-
- val centers = initialModel match {
- case Some(kMeansCenters) => {
- Array(kMeansCenters.clusterCenters.map(s => new VectorWithNorm(s)))
- }
- case None => {
- if (initializationMode == KMeans.RANDOM) {
- initRandom(data)
- } else {
- initKMeansParallel(data)
- }
- }
- }
- val initTimeInSeconds = (System.nanoTime() - initStartTime) / 1e9
- logInfo(s"Initialization with $initializationMode took " +
"%.3f".format(initTimeInSeconds) +
- " seconds.")
-
- val active = Array.fill(numRuns)(true)
- val costs = Array.fill(numRuns)(0.0)
-
- var activeRuns = new ArrayBuffer[Int] ++ (0 until numRuns)
+ var done = false
+ var costs = 0.0
var iteration = 0
-
val iterationStartTime = System.nanoTime()
- // Execute iterations of Lloyd's algorithm until all runs have
converged
- while (iteration < maxIterations && !activeRuns.isEmpty) {
+ // Execute Lloyd's algorithm until converged or reached the max number
of iterations
--- End diff --
Are there differences in implementation between Lloyd's algorithm and
KMeans? It would be useful to comment it here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]