Github user debasish83 commented on the pull request:

    https://github.com/apache/spark/pull/3221#issuecomment-84522361
  
    @mengxr @dlwh @tmyklebu I integrated Breeze QuadraticMinimizer in ml.ALS 
redesign and first I am comparing the performance of ml.CholeskySolver with 
Breeze QuadraticMinimizer...If no proximal operator is specified 
QuadraticMinimizer becomes a CholeskySolver...
    
    Compared to ml.CholeskySolver the major difference is that we take normal 
equation ne.ata and form a breeze DenseMatrix (full gram) which is sent to 
QuadraticMinimizer API. QuadraticMinimizer maintains it's own gram/quasi 
definite workspace and the solver does not mutate the ata being sent from mllib 
but copies it to its workspace. I am not sure if ml.CholeskySolver mutates ata 
through dposv or not ?
    
    I made the seed 0L so that both runs give exact same results.
    
    Breeze QuadraticMinimizer:
    
    unset solver; ./bin/spark-submit --master 
spark://TUSCA09LMLVT00C.local:7077 --class 
org.apache.spark.examples.mllib.MovieLensALS --jars 
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar 
--total-executor-cores 1 
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 
--numIterations 2 ~/datasets/ml-1m/ratings.dat
    
    Got 1000209 ratings from 6040 users on 3706 movies.
    Training: 800670, test: 199539.
    Quadratic minimization userConstraint SMOOTH productConstraint SMOOTH
    Running Breeze QuadraticMinimizer
    Running Breeze QuadraticMinimizer
    Test RMSE = 2.498508112623384.
     
    TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime 
./work/app-20150321215010-0002/0/stderr 
    15/03/21 21:50:19 INFO ALS: solveTime 428.926 ms
    15/03/21 21:50:20 INFO ALS: solveTime 100.322 ms
    15/03/21 21:50:20 INFO ALS: solveTime 98.001 ms
    15/03/21 21:50:21 INFO ALS: solveTime 88.865 ms
    15/03/21 21:50:21 INFO ALS: solveTime 46.189 ms
    15/03/21 21:50:22 INFO ALS: solveTime 35.477 ms
    15/03/21 21:50:22 INFO ALS: solveTime 55.875 ms
    15/03/21 21:50:23 INFO ALS: solveTime 56.772 ms
    15/03/21 21:50:23 INFO ALS: solveTime 33.325 ms
    15/03/21 21:50:24 INFO ALS: solveTime 33.64 ms
    
    ML CholeskySolver:
    
    export solver=mllib; ./bin/spark-submit --master 
spark://TUSCA09LMLVT00C.local:7077 --class 
org.apache.spark.examples.mllib.MovieLensALS --jars 
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar 
--total-executor-cores 1 
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 
--numIterations 2 ~/datasets/ml-1m/ratings.dat
    
    Got 1000209 ratings from 6040 users on 3706 movies.
    Training: 800670, test: 199539.
    Quadratic minimization userConstraint SMOOTH productConstraint SMOOTH
    Test RMSE = 2.498508112623384.
    
    TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime 
./work/app-20150321215729-0003/0/stderr 
    15/03/21 21:57:38 INFO ALS: solveTime 101.987 ms
    15/03/21 21:57:39 INFO ALS: solveTime 37.585 ms
    15/03/21 21:57:39 INFO ALS: solveTime 62.417 ms
    15/03/21 21:57:40 INFO ALS: solveTime 60.191 ms
    15/03/21 21:57:40 INFO ALS: solveTime 36.817 ms
    15/03/21 21:57:41 INFO ALS: solveTime 37.629 ms
    15/03/21 21:57:41 INFO ALS: solveTime 61.636 ms
    15/03/21 21:57:42 INFO ALS: solveTime 63.741 ms
    15/03/21 21:57:42 INFO ALS: solveTime 37.92 ms
    15/03/21 21:57:43 INFO ALS: solveTime 36.442 ms
    
    Similarly to https://github.com/apache/spark/pull/5005 except the first 
iteration runtime difference rest of the runtimes are comparable.
    
    Let's focus on Cholesky first but QuadraticMinimizer was designed to add 
the following features to ml.ALS and enhance the collaborative filtering 
capabilities:
    
    1. Add userConstraint and productConstraint to ml.ALS
    I already added it for my experiments but right now only SMOOTH constraint 
is activated (default)
    2. For ANNLS we are still using mllib NNLS which hopefully will be moved to 
Breeze NNLS
    3. Sparse Coding through L2 and L1 constraints
    4. LSA with Quadratic Loss (Equality and positive constraints on users, 
bounds on products, column normalization on products after every ALS iteration)
    5. Probabilistic matrix factorization through bound constraints
    
    More details are on parent JIRA.
    
    Barring the first iteration, runtimes with CholeskySolver looks comparable 
and so most likely this is version that I am going to push to breeze tomorrow. 
It will be great if you guys could give any pointers on first iteration. I will 
also take a closer look tomorrow...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to