[GitHub] spark pull request: [SPARK-6227] [MLlib] [PySpark] Implement PySpa...

dusenberrymw Wed, 12 Aug 2015 11:56:47 -0700

Github user dusenberrymw commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7963#discussion_r36899299
  
    --- Diff: python/pyspark/mllib/linalg/distributed.py ---
    @@ -352,6 +465,59 @@ def toBlockMatrix(self, rowsPerBlock=1024, 
colsPerBlock=1024):
                                                                colsPerBlock)
             return BlockMatrix(java_block_matrix, rowsPerBlock, colsPerBlock)
     
    +    def computeSVD(self, k, computeU=False, rCond=1e-9):
    +        """
    +        Computes the singular value decomposition of the IndexedRowMatrix.
    +
    +        The given row matrix A of dimension (m X n) is decomposed into
    +        U * s * V'T where
    +
    +        * U: (m X k) (left singular vectors) is a IndexedRowMatrix
    +             whose columns are the eigenvectors of (A X A')
    +        * s: DenseVector consisting of square root of the eigenvalues
    +             (singular values) in descending order.
    +        * v: (n X k) (right singular vectors) is a Matrix whose columns
    +             are the eigenvectors of (A' X A)
    +
    +        For more specific details on implementation, please refer
    +        the scala documentation.
    +
    +        :param k: Set the number of singular values to keep.
    +        :param computeU: Whether or not to compute U. If set to be
    +                         True, then U is computed by A * V * s^-1
    +        :param rCond: Reciprocal condition number. All singular values
    +                      smaller than rCond * s[0] are treated as zero
    +                      where s[0] is the largest singular value.
    +        :returns: SingularValueDecomposition object
    +
    +        >>> data = [(0, (3, 1, 1)), (1, (-1, 3, 1))]
    +        >>> irm = IndexedRowMatrix(sc.parallelize(data))
    +        >>> svd_model = irm.computeSVD(2, True)
    +        >>> svd_model.U.rows.collect() # doctest: +NORMALIZE_WHITESPACE
    +        [IndexedRow(0, [-0.707106781187,0.707106781187]),\
    +        IndexedRow(1, [-0.707106781187,-0.707106781187])]
    +        >>> svd_model.s
    +        DenseVector([3.4641, 3.1623])
    +        >>> svd_model.V
    +        DenseMatrix(3, 2, [-0.4082, -0.8165, -0.4082, 0.8944, -0.4472, 
0.0], 0)
    +        """
    +        j_model = self._java_matrix_wrapper.call(
    +            "computeSVD", int(k), bool(computeU), float(rCond))
    +        return SingularValueDecomposition(j_model)
    +
    +    def multiply(self, matrix):
    +        """
    +        Multiplies the given IndexedRowMatrix with another matrix.
    +
    +        :param matrix: Matrix to multiply with.
    +        :returns: IndexedRowMatrix
    +
    +        >>> mat = IndexedRowMatrix(sc.parallelize([(0, (0, 1)), (1, (2, 
3))]))
    +        >>> mat.multiply(DenseMatrix(2, 2, [0, 2, 1, 3])).rows.collect()
    +        [IndexedRow(0, [2.0,3.0]), IndexedRow(1, [6.0,11.0])]
    +        """
    +        return IndexedRowMatrix(self._java_matrix_wrapper.call("multiply", 
matrix))
    --- End diff --
    
    I'd check that `matrix` is a `DenseMatrix` here as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6227] [MLlib] [PySpark] Implement PySpa...

Reply via email to