GitHub user debasish83 opened a pull request:

    https://github.com/apache/spark/pull/3221

    [MLLIB] [WIP] SPARK-2426: Quadratic Minimization for ALS

    ALS is a generic algorithm for matrix factorization which is equally 
applicable for both feature space and similarity space. Current ALS support L2 
regularization and positivity constraint. This PR introduces userConstraint and 
productConstraint to ALS and let's the user select different constraints for 
user and product solves. The supported constraints are the following:
    
    SMOOTH : default ALS with L2 regularization
    ELASTIC NET: ALS with Elastic Net regularization
    POSITIVE: ALS with positive factors
    BOUNDS: ALS with factors bounded within upper and lower bound (default 
within 0 and 1)
    EQUALITY: ALS with equality constraint (default the factors sum up to 1 and 
positive)
    
    First let's focus on the problem formulation. Both implicit and explicit 
feedback ALS formulation can be written as a quadratic minimization problem. 
The quadratic objective can be written as xtHx + ctx. Each of the respective 
constraints take the following form (for example sparsity constraint)
    
    minimize xtHx + ctx
    s.t ||x||1 <= c (SPARSE constraint)
    
    We rewrite the objective as f(x) = xtHx + ctx and the constraint as an 
indicator function g(x)
    Now minimization of f(x) + g(x) can be carried out using various forward 
backward splitting algorithms. We choose ADMM for this PR.
    
    For use-cases the PR is focused on the following:
    
    1. Sparse matrix factorization
    Example run:
    MASTER=spark://localhost:7077 ./bin/run-example mllib.MovieLensALS --rank 
20 --numIterations 10 --userConstraint SMOOTH --lambdaUser 0.065 
--productConstraint SPARSE --lambdaProduct 0.1 --kryo 
hdfs://localhost:8020/sandbox/movielens/
    
    2. Topic modeling using LSA
    References:
    2007 Sparse coding: 
papers.nips.cc/paper/2979-efficient-sparse-coding-algorithms.pdf
    2011 Sparse Latent Semantic Analysis LSA(some of it is implemented in 
Graphlab): 
    https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
    2012 Sparse Coding + MR/MPI Microsoft: 
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf
    Implementing the 20NG flow to validate the sparse coding result improvement 
over LDA based topic modeling.
    
    3. Topic modeling using PLSA
    Reference: 
    Tutorial on Probabilistic Topic Modeling: Additive Regularization for 
Stochastic Matrix Factorization
    The EQUALITY formulation with a Quadratic loss is an approximation to the 
KL divergence loss being used in PLSA. We are interested to see if it improves 
the result further as compared to the Sparse coding.
    
    Next steps:
    1. Move QuadraticMinimizerSuite.scala to breeze
    2. Scale the factorization rank and remove the need to construct H matrix
    3. Replace the quadratic loss xtHx + ctx with a Convex loss
    
    Detailed experiments are on the JIRA 
https://issues.apache.org/jira/browse/SPARK-2426

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/debasish83/spark qp-als

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3221
    
----
commit 0b3f0530702b7ca54b5152a3b65530113b2d538c
Author: Debasish Das <[email protected]>
Date:   2014-06-13T03:24:24Z

    jecos integrated as the default qpsolve in ALS; implicit tests are failing

commit 8ba4871ed44c5e971dddd1888be34fb5f950bf76
Author: Debasish Das <[email protected]>
Date:   2014-06-18T04:30:12Z

    Qp options added to Spark ALS: unbounded Qp, Qp with pos, Qp with bounds, 
Qp with smoothness, Qp with L1

commit dd912db0966c28501948cdd27e5271ff9b10a8c5
Author: Debasish Das <[email protected]>
Date:   2014-06-19T08:19:06Z

    Prepared branch of ALS-QP feature/runtime testing

commit 6dd320b170ffe7d6f6487bfe686192a672fe20c9
Author: Debasish Das <[email protected]>
Date:   2014-06-20T23:01:21Z

    QpProblem drivers in MovieLensALS

commit 48023c84fd4896899a2d0dfd25f7c33234551b22
Author: Debasish Das <[email protected]>
Date:   2014-06-21T01:23:41Z

    debug option for octave quadprog validation

commit e7e64b7741c001735b85b25bca0dc6cea534bed3
Author: Debasish Das <[email protected]>
Date:   2014-06-21T07:50:18Z

    L1 option added to ALS; Driver added to MovieLensALS

commit 84f1d67242e4ac6a7846f60c042269e31a57894a
Author: Debasish Das <[email protected]>
Date:   2014-06-21T09:33:02Z

    Qp with equality and bounds added to option 4 of ECOS based QpSolver

commit 90bca10bcbf255135b4a9ec5b68b848133298dc7
Author: Debasish Das <[email protected]>
Date:   2014-06-30T22:59:08Z

    Movielens runtime experiments for Spark Summit talk

commit 4e2c6235b52dd81a8a2e74182b2751c389201158
Author: Debasish Das <[email protected]>
Date:   2014-07-15T15:28:43Z

    ADMM based QuadraticMinimizer in mllib.optimization;Used in ALS

commit 3f93ee5741be230e6a0fcab1bac9459243e9355e
Author: Debasish Das <[email protected]>
Date:   2014-07-16T01:50:55Z

    Refactored to use com.github.ecos package

commit f2888465677028fd1e85eb3f591b2537514529a2
Author: Debasish Das <[email protected]>
Date:   2014-07-18T02:43:30Z

    moved interior point based qp-als to feature/ipmqp-als branch; preparing 
for distributed runs; rho=50 for equality constraint, default rho=1.0, alpha = 
1.0 (no over-relaxation) for convergence study

commit 21d79901ebb6e9399ebf192afda3e5cbbe782f31
Author: Debasish Das <[email protected]>
Date:   2014-08-02T07:15:34Z

    license cleanup; Copyright added to NOTICE

commit 13cb89b27963868227e940fd946e8faadfa32cb1
Author: Debasish Das <[email protected]>
Date:   2014-08-05T05:57:48Z

    Merge with HEAD

commit a12d92a3c3a950bd8782592cb8c797199aa1fdfc
Author: Debasish Das <[email protected]>
Date:   2014-08-07T21:36:25Z

    BSD license for Proximal algorithms

commit 02199a8939f421c10ffc688bfb0ce1e1a908e369
Author: Debasish Das <[email protected]>
Date:   2014-08-08T18:19:30Z

    LICENSE and NOTICE updates as per Legal

commit f43ed66127781a5e6669e059686eb4cb9c5c2e28
Author: Debasish Das <[email protected]>
Date:   2014-08-09T05:45:47Z

    Merge with master

commit c03dbeda9b3b1d0f182d3c1450bb1e8b3e0c9af3
Author: Debasish Das <[email protected]>
Date:   2014-08-13T06:42:56Z

    Merge branch 'feature/qp-als' of 
https://istg.vzvisp.com:8443/stash/scm/bda/spark into qp-als

commit c9d1fbf88337058f59fea958fb8f1aa17ea92c74
Author: Debasish Das <[email protected]>
Date:   2014-10-08T03:01:36Z

    Redesign of ALS API; userConstraint and productConstraint separated;

commit f2cab3e63a6269d9455ac07299c69bd68b7c957c
Author: Debasish Das <[email protected]>
Date:   2014-10-29T22:58:11Z

    delimiter added to MovieLensALS example;rho tuning in 
QuadraticMinimizer;Elastic net formulation in ALS, elastic net parameter not 
exposed to users

commit b2c9daca98f7766394596ca819813a06a8ec46b3
Author: Debasish Das <[email protected]>
Date:   2014-10-30T20:21:42Z

    rho as sqrt(eigenMin*eigenMax) for sparse, sqrt(eigenMax) for other 
formulations

commit c01f3e3dacec502b534a1b21f3b328f5586ddd02
Author: Debasish Das <[email protected]>
Date:   2014-10-31T20:48:39Z

    NNLS bug2

commit a9412079d9f05cf9514d62978f0b4525590b871b
Author: Debasish Das <[email protected]>
Date:   2014-11-01T15:26:09Z

    removed ecos dependency from mllib pom

commit 9b3951f558e5673eb475c575f14876421b5a3abc
Author: Debasish Das <[email protected]>
Date:   2014-11-05T01:23:09Z

    validate user/product on MovieLens dataset through user input and compute 
map measure along with rmse

commit cd3ab31cb9b244bae2b45396a6269ed1dc59151b
Author: Debasish Das <[email protected]>
Date:   2014-11-05T22:43:11Z

    merged with AbstractParams serialization bug

commit 4bbae0f248ca8747b47ecf852d5aba19c9b39dab
Author: Debasish Das <[email protected]>
Date:   2014-11-05T23:23:02Z

    comments fixed as per scalastyle

commit 9fa063e1eb172d68248e03797a54acc738543592
Author: Debasish Das <[email protected]>
Date:   2014-11-06T00:05:24Z

    import scala.math.round

commit 10cbb37a7881867d801ae6630ffc0d09b3feebf9
Author: Debasish Das <[email protected]>
Date:   2014-11-08T06:31:40Z

    provide ratio for topN product validation; generate MAP and prec@k metric 
for movielens dataset

commit f38a1b59e27907f2aa9bd732c5f9147b738d3a0f
Author: Debasish Das <[email protected]>
Date:   2014-11-08T06:45:13Z

    use sampleByKey for per user sampling

commit d144f57a58c9424365f1242f90961386c016641e
Author: Debasish Das <[email protected]>
Date:   2014-11-12T04:56:46Z

    recommendAll API to MatrixFactorizationModel, uses topK finding using 
BoundedPriorityQueue similar to RDD.top

commit 1e7e36e9158f5fcdf576ce87577fea05d11ea216
Author: Debasish Das <[email protected]>
Date:   2014-11-12T07:18:35Z

    Updated qp-als with irmetrics for experiments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to