Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21119#discussion_r184874390 --- Diff: python/pyspark/ml/clustering.py --- @@ -1156,6 +1156,205 @@ def getKeepLastCheckpoint(self): return self.getOrDefault(self.keepLastCheckpoint) +@inherit_doc +class PowerIterationClustering(HasMaxIter, HasPredictionCol, JavaTransformer, JavaParams, + JavaMLReadable, JavaMLWritable): + """ + .. note:: Experimental + + Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by + <a href=http://www.icml2010.org/papers/387.pdf>Lin and Cohen</a>. From the abstract: + PIC finds a very low-dimensional embedding of a dataset using truncated power + iteration on a normalized pair-wise similarity matrix of the data. + + PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix + is a symmetric matrix whose entries are non-negative similarities between items. + PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row + includes: + + - :py:attr:`idCol`: vertex ID + - :py:attr:`neighborsCol`: neighbors of vertex in :py:attr:`idCol` + - :py:attr:`similaritiesCol`: non-negative weights (similarities) of edges between the + vertex in :py:attr:`idCol` and each neighbor in :py:attr:`neighborsCol` + + PIC returns a cluster assignment for each input vertex. It appends a new column + :py:attr:`predictionCol` containing the cluster assignment in :py:attr:`[0,k)` for + each row (vertex). + + .. note:: + + - [[PowerIterationClustering]] is a transformer with an expensive [[transform]] operation. + Transform runs the iterative PIC algorithm to cluster the whole input dataset. + - Input validation: This validates that similarities are non-negative but does NOT validate + that the input matrix is symmetric. + + .. seealso:: <a href=http://en.wikipedia.org/wiki/Spectral_clustering> + Spectral clustering (Wikipedia)</a> --- End diff -- You can check other places using `seealso`: ```python .. seealso:: `Spectral clustering \ <http://en.wikipedia.org/wiki/Spectral_clustering>`_ ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org