Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/3221#issuecomment-84522361
@mengxr @dlwh @tmyklebu I integrated Breeze QuadraticMinimizer in ml.ALS
redesign and first I am comparing the performance of ml.CholeskySolver with
Breeze QuadraticMinimizer...If no proximal operator is specified
QuadraticMinimizer becomes a CholeskySolver...
Compared to ml.CholeskySolver the major difference is that we take normal
equation ne.ata and form a breeze DenseMatrix (full gram) which is sent to
QuadraticMinimizer API. QuadraticMinimizer maintains it's own gram/quasi
definite workspace and the solver does not mutate the ata being sent from mllib
but copies it to its workspace. I am not sure if ml.CholeskySolver mutates ata
through dposv or not ?
I made the seed 0L so that both runs give exact same results.
Breeze QuadraticMinimizer:
unset solver; ./bin/spark-submit --master
spark://TUSCA09LMLVT00C.local:7077 --class
org.apache.spark.examples.mllib.MovieLensALS --jars
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--total-executor-cores 1
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50
--numIterations 2 ~/datasets/ml-1m/ratings.dat
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint SMOOTH productConstraint SMOOTH
Running Breeze QuadraticMinimizer
Running Breeze QuadraticMinimizer
Test RMSE = 2.498508112623384.
TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime
./work/app-20150321215010-0002/0/stderr
15/03/21 21:50:19 INFO ALS: solveTime 428.926 ms
15/03/21 21:50:20 INFO ALS: solveTime 100.322 ms
15/03/21 21:50:20 INFO ALS: solveTime 98.001 ms
15/03/21 21:50:21 INFO ALS: solveTime 88.865 ms
15/03/21 21:50:21 INFO ALS: solveTime 46.189 ms
15/03/21 21:50:22 INFO ALS: solveTime 35.477 ms
15/03/21 21:50:22 INFO ALS: solveTime 55.875 ms
15/03/21 21:50:23 INFO ALS: solveTime 56.772 ms
15/03/21 21:50:23 INFO ALS: solveTime 33.325 ms
15/03/21 21:50:24 INFO ALS: solveTime 33.64 ms
ML CholeskySolver:
export solver=mllib; ./bin/spark-submit --master
spark://TUSCA09LMLVT00C.local:7077 --class
org.apache.spark.examples.mllib.MovieLensALS --jars
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--total-executor-cores 1
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50
--numIterations 2 ~/datasets/ml-1m/ratings.dat
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint SMOOTH productConstraint SMOOTH
Test RMSE = 2.498508112623384.
TUSCA09LMLVT00C:spark-qp-als v606014$ grep solveTime
./work/app-20150321215729-0003/0/stderr
15/03/21 21:57:38 INFO ALS: solveTime 101.987 ms
15/03/21 21:57:39 INFO ALS: solveTime 37.585 ms
15/03/21 21:57:39 INFO ALS: solveTime 62.417 ms
15/03/21 21:57:40 INFO ALS: solveTime 60.191 ms
15/03/21 21:57:40 INFO ALS: solveTime 36.817 ms
15/03/21 21:57:41 INFO ALS: solveTime 37.629 ms
15/03/21 21:57:41 INFO ALS: solveTime 61.636 ms
15/03/21 21:57:42 INFO ALS: solveTime 63.741 ms
15/03/21 21:57:42 INFO ALS: solveTime 37.92 ms
15/03/21 21:57:43 INFO ALS: solveTime 36.442 ms
Similarly to https://github.com/apache/spark/pull/5005 except the first
iteration runtime difference rest of the runtimes are comparable.
Let's focus on Cholesky first but QuadraticMinimizer was designed to add
the following features to ml.ALS and enhance the collaborative filtering
capabilities:
1. Add userConstraint and productConstraint to ml.ALS
I already added it for my experiments but right now only SMOOTH constraint
is activated (default)
2. For ANNLS we are still using mllib NNLS which hopefully will be moved to
Breeze NNLS
3. Sparse Coding through L2 and L1 constraints
4. LSA with Quadratic Loss (Equality and positive constraints on users,
bounds on products, column normalization on products after every ALS iteration)
5. Probabilistic matrix factorization through bound constraints
More details are on parent JIRA.
Barring the first iteration, runtimes with CholeskySolver looks comparable
and so most likely this is version that I am going to push to breeze tomorrow.
It will be great if you guys could give any pointers on first iteration. I will
also take a closer look tomorrow...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]