GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/3720

    [WIP][SPARK-3541][MLLIB] New ALS implementation with improved storage

    This is a WIP.
    
    This PR adds a new ALS implementation to `spark.ml` using the pipeline API, 
which should be able to scale to billions of ratings. Compared with the ALS 
under `spark.mllib`, the new implementation
    
    1. uses the same algorithm,
    2. uses float type for ratings,
    3. uses primitive arrays to avoid GC,
    4. sorts and compresses ratings on each block so that only O(k^2) memory is 
required for solving all least square problems.
    
    I keep the `spark.mllib`'s ALS untouched for easy comparison. If the new 
implementation works well, it will replace the old one.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-3541

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3720
    
----
commit 8ae86b5b7239d3bceac214dce854da3b48aeba3f
Author: Xiangrui Meng <[email protected]>
Date:   2014-12-17T05:31:01Z

    add a working copy of the new ALS implementation

commit 1efaecfc968fb3f3eb1cf5d4c224aa17664c447a
Author: Xiangrui Meng <[email protected]>
Date:   2014-12-17T06:46:23Z

    add example code

commit 3f2d81aae68e4a9dd095183c6fed4622a0fb0015
Author: Xiangrui Meng <[email protected]>
Date:   2014-12-17T07:07:20Z

    add doc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to