Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21119#discussion_r184839128
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -1156,6 +1156,204 @@ def getKeepLastCheckpoint(self):
             return self.getOrDefault(self.keepLastCheckpoint)
     
     
    +@inherit_doc
    +class PowerIterationClustering(HasMaxIter, HasPredictionCol, 
JavaTransformer, JavaParams,
    +                               JavaMLReadable, JavaMLWritable):
    +    """
    +    .. note:: Experimental
    +    Power Iteration Clustering (PIC), a scalable graph clustering 
algorithm developed by
    +    <a href=http://www.icml2010.org/papers/387.pdf>Lin and Cohen</a>. From 
the abstract:
    +    PIC finds a very low-dimensional embedding of a dataset using 
truncated power
    +    iteration on a normalized pair-wise similarity matrix of the data.
    +
    +    PIC takes an affinity matrix between items (or vertices) as input.  An 
affinity matrix
    +    is a symmetric matrix whose entries are non-negative similarities 
between items.
    +    PIC takes this matrix (or graph) as an adjacency matrix.  
Specifically, each input row
    +    includes:
    +
    +     - :py:class:`idCol`: vertex ID
    --- End diff --
    
    ```:py:attr:`idCol` ```? And also the below ```:py:class:`neighborsCol` 
```, etc...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to