shahid created SPARK-24484: ------------------------------ Summary: Power Iteration Clustering is giving incorrect clustering results when mutiple leading eigen values. Key: SPARK-24484 URL: https://issues.apache.org/jira/browse/SPARK-24484 Project: Spark Issue Type: Bug Components: ML, MLlib Affects Versions: 2.4.0 Reporter: shahid
When there are multiple leading eigen values of the normalized affinity matrix, power iteration clustering gives incorrect results. We should either give an error or warning to the user when PIC doesn't converges ( ie. when |\lambda_1/\lambda_2| = 1 ) {code:java} test("Fail to converge: Multiple leading eigen values") { /* Graph: 2 / / 1 3 - - 4 Adjacency matrix: [(0, 1, 0, 0), (1, 0, 0, 0), A = (0, 0, 0, 1), (0, 0, 1, 0)] */ val data = Seq[(Long, Long, Double)]( (1, 2, 1.0), (3, 4, 1.0) ).toDF("src", "dst", "weight") val result = new PowerIterationClustering() .setK(2) .setMaxIter(20) .setInitMode("random") .setWeightCol("weight") .assignClusters(data) .select('id, 'cluster) val predictions = Array.fill(2)(mutable.Set.empty[Long]) result.collect().foreach { case Row(id: Long, cluster: Integer) => predictions(cluster) += id } assert(predictions.toSet == Set(Array(1, 2).toSet, Array(3, 4).toSet)) } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org