GitHub user brkyvz opened a pull request:

    https://github.com/apache/spark/pull/3319

    [SPARK-4409][MLlib] Additional Linear Algebra Utils

    Addition of a very limited number of local matrix manipulation and 
generation methods that would be helpful in the further development for 
algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and 
Multi Model Training (SPARK-1486).
    The proposed methods for addition are:
    
    For `Matrix`
     - map: maps the values in the matrix with a given function. Produces a new 
matrix.
     - update: the values in the matrix are updated with a given function. 
Occurs in place.
    
    Factory methods for `DenseMatrix`:
     - *zeros: Generate a matrix consisting of zeros
     - *ones: Generate a matrix consisting of ones
     - *eye: Generate an identity matrix
     - *rand: Generate a matrix consisting of i.i.d. uniform random numbers
     - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
     - *diag: Generate a diagonal matrix from a supplied vector
    *These methods already exist in the factory methods for `Matrices`, however 
for cases where we require a `DenseMatrix`, you constantly have to add 
`.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I 
propose moving these functions to factory methods for `DenseMatrix` where the 
putput will be a `DenseMatrix` and the factory methods for `Matrices` will call 
these functions directly and output a generic `Matrix`.
    
    Factory methods for `SparseMatrix`:
     - speye: Identity matrix in sparse format. Saves a ton of memory when 
dimensions are large, especially in Multi Model Training, where each row 
requires being multiplied by a scalar.
     - sprand: Generate a sparse matrix with a given density consisting of 
i.i.d. uniform random numbers.
     - sprandn: Generate a sparse matrix with a given density consisting of 
i.i.d. gaussian random numbers.
     - diag: Generate a diagonal matrix from a supplied vector, but is memory 
efficient, because it just stores the diagonal. Again, very helpful in Multi 
Model Training.
    
    Factory methods for `Matrices`:
     - Include all the factory methods given above, but return a generic 
`Matrix` rather than `SparseMatrix` or `DenseMatrix`.
     - horzCat: Horizontally concatenate matrices to form one larger matrix. 
Very useful in both Multi Model Training, and for the repartitioning of 
BlockMatrix.
     - vertCat: Vertically concatenate matrices to form one larger matrix. Very 
useful for the repartitioning of BlockMatrix.
    
    The names for these methods were selected from MATLAB

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brkyvz/spark SPARK-4409

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3319
    
----
commit a14c0da0360b4202a2db787b85ce631562014f0d
Author: Burak Yavuz <[email protected]>
Date:   2014-11-17T09:33:36Z

    [SPARK-4409] Initial commit to add methods

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to