Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/5005#issuecomment-81358204
I compared first Breeze NNLS and mllib NNLS as it is simpler.
The algorithms are same as implemented by @coderxiang. I migrated it to
Breeze as it is a local solver and used breeze optimization pattern. Right now
the breeze packages are for benchmarking purposes.
breeze.optimize.linear and breeze.optimize.proximal packages will be
cleaned once we are done with the stress test.
I tried to make all the seeds 0L so that both runs are looking at same
results (the train set and test set have same number of records, ALS seed is
anyway at 0L)
To run Breeze NNLS:
export solver=breeze; ./bin/spark-submit --master
spark://TUSCA09LMLVT00C.local:7077 --total-executor-cores 2 --class
org.apache.spark.examples.mllib.MovieLensALS --jars
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar
~/datasets/ml-1m/ratings.dat --nonNegative --numIterations 2
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime
./work/breeze-nnls/0/stderr
15/03/15 19:23:43 INFO ALS: solveTime 171.149 ms
15/03/15 19:23:43 INFO ALS: solveTime 164.08 ms
15/03/15 19:23:43 INFO ALS: solveTime 69.235 ms
15/03/15 19:23:43 INFO ALS: solveTime 74.665 ms
15/03/15 19:23:43 INFO ALS: solveTime 31.05 ms
15/03/15 19:23:43 INFO ALS: solveTime 32.547 ms
15/03/15 19:23:44 INFO ALS: solveTime 52.543 ms
15/03/15 19:23:44 INFO ALS: solveTime 53.277 ms
15/03/15 19:23:44 INFO ALS: solveTime 31.844 ms
15/03/15 19:23:44 INFO ALS: solveTime 31.71 ms
To run mllib NNLS:
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime
./work/mllib-nnls/0/stderr
15/03/15 19:25:06 INFO ALS: solveTime 156.808 ms
15/03/15 19:25:06 INFO ALS: solveTime 156.945 ms
15/03/15 19:25:06 INFO ALS: solveTime 76.609 ms
15/03/15 19:25:06 INFO ALS: solveTime 76.628 ms
15/03/15 19:25:06 INFO ALS: solveTime 42.312 ms
15/03/15 19:25:06 INFO ALS: solveTime 40.153 ms
15/03/15 19:25:07 INFO ALS: solveTime 72.031 ms
15/03/15 19:25:07 INFO ALS: solveTime 73.184 ms
15/03/15 19:25:07 INFO ALS: solveTime 37.863 ms
15/03/15 19:25:07 INFO ALS: solveTime 39.34 ms
Next I will compare on CholeskySolver vs QuadraticMinimizer default.
The memory optimization for triangular space will be a common optimization
for both mllib/breeze NNLS and breeze QuadraticMinimizer. I will take that as
an enhancement PR for breeze. It's a bit tricky for QuadraticMinimizer
specially since it supports affine constraints of the form Aeq x = beq and
Inequalities A x <= b or lb <= x <= ub...The affine constraint makes it a bit
tricky.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]