[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-11-19 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218891#comment-14218891
 ] 

Debasish Das edited comment on SPARK-2426 at 11/20/14 2:13 AM:
---

With the MAP measures being added to examples.MovieLensALS through 
https://issues.apache.org/jira/browse/SPARK-4231 I compared the quality and 
runtime of the matrix completion formulations on MovieLens 1M dataset:

Default: userConstraint L2, productConstraint L2 lambdaUser=lambdaProduct=0.065 
rank=100 iterations 10

Test RMSE = 0.8436480113821955.
Test users 6038 MAP 0.05860164548002782

Solver: Cholesky decomposition followed by forward-backward solves

Per iteration runtime for baseline (solveTime in ms)

14/11/19 17:37:06 INFO ALS: usersOrProducts 924 slowConvergence 0 
QuadraticMinimizer solveTime 362.813 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 910 slowConvergence 0 
QuadraticMinimizer solveTime 314.527 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 927 slowConvergence 0 
QuadraticMinimizer solveTime 265.75 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 918 slowConvergence 0 
QuadraticMinimizer solveTime 271.513 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1510 slowConvergence 0 
QuadraticMinimizer solveTime 370.177 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1512 slowConvergence 0 
QuadraticMinimizer solveTime 467.994 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1507 slowConvergence 0 
QuadraticMinimizer solveTime 511.894 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1511 slowConvergence 0 
QuadraticMinimizer solveTime 481.189 Iters 0

NMF: userConstraint POSITIVE, productConstraint POSITIVE, 
userLambda=productLambda=0.065 L2 regularization

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint POSITIVE productConstraint POSITIVE
Test RMSE = 0.8435335132641906.
Test users 6038 MAP 0.056361816590625446

ALS iteration1 runtime:

QuadraticMinimizer convergence profile:

14/11/19 17:46:46 INFO ALS: usersOrProducts 918 slowConvergence 0 
QuadraticMinimizer solveTime 1936.281 Iters 73132
14/11/19 17:46:46 INFO ALS: usersOrProducts 927 slowConvergence 0 
QuadraticMinimizer solveTime 1871.364 Iters 75219
14/11/19 17:46:46 INFO ALS: usersOrProducts 910 slowConvergence 0 
QuadraticMinimizer solveTime 2067.735 Iters 73180
14/11/19 17:46:46 INFO ALS: usersOrProducts 924 slowConvergence 0 
QuadraticMinimizer solveTime 2127.161 Iters 75546
14/11/19 17:46:53 INFO ALS: usersOrProducts 1507 slowConvergence 0 
QuadraticMinimizer solveTime 3813.923 Iters 193207
14/11/19 17:46:54 INFO ALS: usersOrProducts 1511 slowConvergence 0 
QuadraticMinimizer solveTime 3894.068 Iters 196882
14/11/19 17:46:54 INFO ALS: usersOrProducts 1510 slowConvergence 0 
QuadraticMinimizer solveTime 3875.915 Iters 193987
14/11/19 17:46:54 INFO ALS: usersOrProducts 1512 slowConvergence 0 
QuadraticMinimizer solveTime 3939.765 Iters 192471

NNLS convergence profile:

14/11/19 17:46:46 INFO ALS: NNLS solveTime 252.909 iters 7381
14/11/19 17:46:46 INFO ALS: NNLS solveTime 256.803 iters 7740
14/11/19 17:46:46 INFO ALS: NNLS solveTime 274.352 iters 7491
14/11/19 17:46:46 INFO ALS: NNLS solveTime 272.971 iters 7664
14/11/19 17:46:53 INFO ALS: NNLS solveTime 1487.262 iters 60338
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1472.742 iters 61321
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1489.863 iters 62228
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1494.192 iters 60489

ALS iteration 10

Quadratic Minimizer convergence profile:

14/11/19 17:48:17 INFO ALS: usersOrProducts 924 slowConvergence 0 
QuadraticMinimizer solveTime 1082.056 Iters 53724
14/11/19 17:48:17 INFO ALS: usersOrProducts 910 slowConvergence 0 
QuadraticMinimizer solveTime 1180.601 Iters 50593
14/11/19 17:48:17 INFO ALS: usersOrProducts 927 slowConvergence 0 
QuadraticMinimizer solveTime 1106.131 Iters 53069
14/11/19 17:48:17 INFO ALS: usersOrProducts 918 slowConvergence 0 
QuadraticMinimizer solveTime 1108.478 Iters 50895
14/11/19 17:48:23 INFO ALS: usersOrProducts 1510 slowConvergence 0 
QuadraticMinimizer solveTime 2262.193 Iters 116818
14/11/19 17:48:23 INFO ALS: usersOrProducts 1512 slowConvergence 0 
QuadraticMinimizer solveTime 2293.64 Iters 116026
14/11/19 17:48:23 INFO ALS: usersOrProducts 1507 slowConvergence 0 
QuadraticMinimizer solveTime 2241.491 Iters 116293
14/11/19 17:48:23 INFO ALS: usersOrProducts 1511 slowConvergence 0 
QuadraticMinimizer solveTime 2372.957 Iters 118391

NNLS convergence profile:

14/11/19 17:48:17 INFO ALS: NNLS solveTime 623.031 iters 21611
14/11/19 17:48:17 INFO ALS: NNLS solveTime 553.493 iters 21732
14/11/19 17:48:17 INFO ALS: NNLS solveTime 559.9 iters 22511
14/11/19 17:48:17 INFO ALS: NNLS solveTime 556.654 iters 21330
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1672.582 iters 86006
14/11/19 17:48:23 INFO ALS: NNLS solveTime 

[jira] [Commented] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib

2014-11-19 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218941#comment-14218941
 ] 

Debasish Das commented on SPARK-1405:
-

For LSA you can find references on the PR. Microsoft paper ran LSA on wiki 
dataset but they compared map and ndcg measures...

NIPS and wiki datasets both will helpI was thinking about wikicould you 
please add the NIPS dataset as well and reference for it ?

I will look into graphlab lda and what measures they are running...

 parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
 -

 Key: SPARK-1405
 URL: https://issues.apache.org/jira/browse/SPARK-1405
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xusen Yin
Assignee: Guoqiang Li
Priority: Critical
  Labels: features
 Attachments: performance_comparison.png

   Original Estimate: 336h
  Remaining Estimate: 336h

 Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts 
 topics from text corpus. Different with current machine learning algorithms 
 in MLlib, instead of using optimization algorithms such as gradient desent, 
 LDA uses expectation algorithms such as Gibbs sampling. 
 In this PR, I prepare a LDA implementation based on Gibbs sampling, with a 
 wholeTextFiles API (solved yet), a word segmentation (import from Lucene), 
 and a Gibbs sampling core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3066) Support recommendAll in matrix factorization model

2014-11-13 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209936#comment-14209936
 ] 

Debasish Das commented on SPARK-3066:
-

On our internal datasets, flatMap is slow...I am changing the code to have 2 
methods (assuming users are tall and products are skinny)...if user and product 
are tall and wide then we need to rethink

recommendAllUsers: takeOrdered is called on each userFeature dot productFeatures

recommendAllProducts: mapPartitions will emit Iterator(productId, 
userPriorityQueue) and reduceByKey will generate the topK users for each 
product..


 Support recommendAll in matrix factorization model
 --

 Key: SPARK-3066
 URL: https://issues.apache.org/jira/browse/SPARK-3066
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng

 ALS returns a matrix factorization model, which we can use to predict ratings 
 for individual queries as well as small batches. In practice, users may want 
 to compute top-k recommendations offline for all users. It is very expensive 
 but a common problem. We can do some optimization like
 1) collect one side (either user or product) and broadcast it as a matrix
 2) use level-3 BLAS to compute inner products
 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3066) Support recommendAll in matrix factorization model

2014-11-11 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207298#comment-14207298
 ] 

Debasish Das commented on SPARK-3066:
-

[~mengxr] I am testing recommendAllUsers and recommendAllProducts API and I 
will add the code to RankingMetrics PR:
https://github.com/apache/spark/pull/3098

I have not used level-3 BLAS yet since we should be able to re-use 
DistributedMatrix API that's coming online (here all the matrices are 
dense)...I used ideas 1 and 2 and I also add a skipRatings in the API (using 
that you can skip the ratings that each user has already provided...for the 
validation I skip the train set basically)

Example API:

def recommendAllUsers(num: Int, skipUserRatings: RDD[Rating]) = {
val skipUsers = skipUserRatings.map { x = ((x.user, x.product), x.rating) }
val productVectors = productFeatures.collect
recommend(productVectors, userFeatures, num, skipUsers)
  }

  def recommendAllProducts(num: Int, skipProductRatings: RDD[Rating]) = {
val skipProducts = skipProductRatings.map { x = ((x.product, x.user), 
x.rating) }
val userVectors = userFeatures.collect
recommend(userVectors, productFeatures, num, skipProducts)
  }

 Support recommendAll in matrix factorization model
 --

 Key: SPARK-3066
 URL: https://issues.apache.org/jira/browse/SPARK-3066
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng

 ALS returns a matrix factorization model, which we can use to predict ratings 
 for individual queries as well as small batches. In practice, users may want 
 to compute top-k recommendations offline for all users. It is very expensive 
 but a common problem. We can do some optimization like
 1) collect one side (either user or product) and broadcast it as a matrix
 2) use level-3 BLAS to compute inner products
 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS

2014-11-06 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200373#comment-14200373
 ] 

Debasish Das commented on SPARK-4231:
-

[~coderxiang] [~mengxr] [~srowen]

I looked at Sean's implementation for MAP metric for recommendation engines and 
it computes a rank of predicted test set over all user/product predictions...I 
don't see how can I send the rank vector to the RankingMetrics API right now

http://cloudera.github.io/oryx/xref/com/cloudera/oryx/als/computation/local/ComputeMAP.html
 

Right now for each user/product I send predictions and labels from test set to 
RankingMetrics API but there is no rank order defined...I retrieved the 
predictions using ALS.predict(userId, productId) API...

 Add RankingMetrics to examples.MovieLensALS
 ---

 Key: SPARK-4231
 URL: https://issues.apache.org/jira/browse/SPARK-4231
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.2.0
Reporter: Debasish Das
 Fix For: 1.2.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 examples.MovieLensALS computes RMSE for movielens dataset but after addition 
 of RankingMetrics and enhancements to ALS, it is critical to look at not only 
 the RMSE but also measures like prec@k and MAP.
 In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and 
 also added a flag that takes an input whether user/product recommendation is 
 being validated.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS

2014-11-06 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200426#comment-14200426
 ] 

Debasish Das commented on SPARK-4231:
-

[~srowen] I need a standard metric to report the ALS enhancements proposed in 
this PR: https://github.com/apache/spark/pull/2705 and more work that we are 
doing in this direction...

The metric should be consistent for topic modeling using LSA as well as the PR 
can solve LSA/PLSA...I have not yet gone into measures like perplexity etc from 
LDA PRs as they are even more complicated but MAP, prec@k and ndcg@k are the 
measures people reported LDA results as well..

Does the MAP definition I used in this PR looks correct to you ? Let me look 
into the AUC example...

 Add RankingMetrics to examples.MovieLensALS
 ---

 Key: SPARK-4231
 URL: https://issues.apache.org/jira/browse/SPARK-4231
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.2.0
Reporter: Debasish Das
 Fix For: 1.2.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 examples.MovieLensALS computes RMSE for movielens dataset but after addition 
 of RankingMetrics and enhancements to ALS, it is critical to look at not only 
 the RMSE but also measures like prec@k and MAP.
 In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and 
 also added a flag that takes an input whether user/product recommendation is 
 being validated.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS

2014-11-04 Thread Debasish Das (JIRA)
Debasish Das created SPARK-4231:
---

 Summary: Add RankingMetrics to examples.MovieLensALS
 Key: SPARK-4231
 URL: https://issues.apache.org/jira/browse/SPARK-4231
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.2.0
Reporter: Debasish Das
 Fix For: 1.2.0


examples.MovieLensALS computes RMSE for movielens dataset but after addition of 
RankingMetrics and enhancements to ALS, it is critical to look at not only the 
RMSE but also measures like prec@k and MAP.

In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and 
also added a flag that takes an input whether user/product recommendation is 
being validated.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-11-03 Thread Debasish Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-2426:

Affects Version/s: (was: 1.0.0)
   1.2.0

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-11-03 Thread Debasish Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-2426:

Affects Version/s: (was: 1.2.0)
   1.3.0

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191502#comment-14191502
 ] 

Debasish Das commented on SPARK-3987:
-

Nope...standard ALS...same as netflix params...0.065 as L2...My ratings are
not within 1-5 but more like 1-10...

Also what's a good condition number for NNLS ?

On Thu, Oct 30, 2014 at 11:25 PM, Xiangrui Meng (JIRA) j...@apache.org



 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
 9735.061160, -45360.674033, 

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191551#comment-14191551
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helped accelerated ADMM but I have not explored it 
yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191551#comment-14191551
 ] 

Debasish Das edited comment on SPARK-2426 at 10/31/14 8:04 AM:
---

[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helps ADMM but I have not explored it yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true


was (Author: debasish83):
[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helped accelerated ADMM but I have not explored it 
yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 

[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
 ] 

Debasish Das edited comment on SPARK-2426 at 10/31/14 4:20 PM:
---

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week. 

I need some advice on whether we should bring the additional ALS features first 
or integrate NNLS with QuadraticMinimizer so that we can handle large ranks as 
well. 

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all 
QuadraticMinimization. 

Right now we support 5 features:

1. Least square
2. Quadratic minimization with positivity
3. Quadratic minimization with box : generalization of positivity
4. Quadratic minimization with elastic net :L1 is at 0.99, elastic net control 
is not given to users 
5. Quadratic minimization with affine constraints and bounds

There are lot many regularization in Proximal.scala which can be re-used in 
mllib updater...L1Updater in mllib is an example of Proximal algorithm...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based 
on problem we are solving)

The CG core from Breeze will be used for iterative solve when ranks are 
high...I need a different variant of CG for Formulation 5 so Breeze CG is not 
sufficient for all the formulations this branch supports and needs to be 
extended..

Right now I am experimenting with ADMM rho and lambda values so that the NNLS 
iterations are at par with Least square with positivity. The idea for rho and 
lambda tuning are the following:

1. Derive an optimal value of lambda for quadratic problems, similar to idea of 
Nesterov's acceleration being used in algorithms like FISTA and accelerated 
ADMM from UCLA
2. Derive rho from approximate min and max eigenvalues of gram matrix 

For Matlab based experiments within PDCO, ECOS(IPM), MOSEK and ADMM variants, 
ADMM is faster with producing result quality within 1e-4 of MOSEK. I will 
publish the numbers and the matlab script through the ECOS jnilib open source 
(GPL licensed). I did not add any of ECOS code here so that everything stays 
Apache.

For topic modeling use-case, I expect to produce sparse coding results (L1 on 
product factors, L2 on user factors)

Example runs:

NMF:

./bin/spark-submit --total-executor-cores 4 --master spark://localhost:7077 
--jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar 
--class org.apache.spark.examples.mllib.MovieLensALS 
./examples/target/spark-examples_2.10-1.1.0-SNAPSHOT.jar --rank 20 
--numIterations 10 --userConstraint POSITIVE --lambdaUser 0.065 
--productConstraint POSITIVE --lambdaProduct 0.065 --kryo 
hdfs://localhost:8020/sandbox/movielens/

Sparse coding:

./bin/spark-submit --total-executor-cores 4 --master spark://localhost:7077 
--jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar 
--class org.apache.spark.examples.mllib.MovieLensALS 
./examples/target/spark-examples_2.10-1.1.0-SNAPSHOT.jar --delimiter   --rank 
20 --numIterations 10 --userConstraint SMOOTH --lambdaUser 0.065 
--productConstraint SPARSE --lambdaProduct 0.065 --kryo 
hdfs://localhost:8020/sandbox/movielens

Robust PLSA with least square loss:

./bin/spark-submit --total-executor-cores 4 --master spark://localhost:7077 
--jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar 
--class org.apache.spark.examples.mllib.MovieLensALS 
./examples/target/spark-examples_2.10-1.1.0-SNAPSHOT.jar --delimiter   --rank 
20 --numIterations 10 --userConstraint EQUALITY --lambdaUser 0.065 
--productConstraint EQUALITY --lambdaProduct 0.065 --kryo 
hdfs://localhost:8020/sandbox/movielens

With this change, users can select to apply user and product specific 
constraint...basically positive factors for products (interpretability) and 
smooth for users to get more RMSE improvements.

Thanks.
Deb


was (Author: debasish83):
Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week. 

I need some advice on whether we should bring the additional ALS features first 
or integrate NNLS with QuadraticMinimizer so that we can handle large ranks as 
well. 

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all 
QuadraticMinimization. 

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in 
mllib updater...L1Updater in mllib is an example of Proximal algorithm...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based 
on problem we are solving)


[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191997#comment-14191997
 ] 

Debasish Das commented on SPARK-2426:
-

Matlab comparisons of MOSEK, ECOS, PDCO and ADMM are over here:
https://github.com/debasish83/ecos/blob/master/README.md

MOSEK is available for research purposes. Let me know if there are issues in 
running the matlab scripts.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192935#comment-14192935
 ] 

Debasish Das commented on SPARK-2426:
-

Refactored QuadraticMinimizer and NNLS from mllib optimization to 
breeze.optimize.quadratic
https://github.com/scalanlp/breeze/pull/321
I will update the PR as well but breeze latest depends on scala 2.11 but spark 
still uses 2.10
All license and copyright information also moved to breeze. So for spark no 
changes to license/notice files.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-3987) NNLS generates incorrect result

2014-10-30 Thread Debasish Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das reopened SPARK-3987:
-

I can send you a further list of failures...this is one more example...I 
strongly suggest moving to a robust convergence criteria inside NNLS than 
adding hacks on step sizes...

P = [0.986619, 0.639909, 0.748906, 0.900377, 0.688079, 0.734711, 0.835164, 
0.723973, 0.822436, 0.852591, 0.699979, 0.609533, 0.559504, 0.708015, 0.544744, 
0.658359, 0.632510, 0.751316, 0.653993, 0.642734, 0.799106, 0.898689, 0.712825, 
0.878405, 0.849565; 0.639909, 1.055175, 0.884940, 0.975502, 0.815121, 0.845699, 
0.899780, 0.709264, 0.960949, 1.021108, 0.896508, 0.692635, 0.659746, 0.809355, 
0.539466, 0.730501, 0.639971, 0.881502, 0.840159, 0.628515, 0.917052, 0.950677, 
0.823301, 1.022355, 0.994935; 0.748906, 0.884940, 1.421868, 1.175203, 0.986093, 
1.028669, 1.091437, 0.849192, 1.204776, 1.249037, 1.115160, 0.815680, 0.772715, 
0.971263, 0.621197, 0.875925, 0.757031, 1.034483, 1.022001, 0.705077, 1.115237, 
1.164796, 0.983688, 1.226847, 1.180170; 0.900377, 0.975502, 1.175203, 1.67, 
1.065369, 1.109756, 1.181625, 0.944176, 1.220418, 1.328803, 1.156144, 0.984967, 
0.916500, 1.046903, 0.728221, 0.991042, 0.855095, 1.181719, 1.095485, 0.901193, 
1.214851, 1.277434, 1.077374, 1.372354, 1.356724; 0.688079, 0.815121, 0.986093, 
1.065369, 1.249908, 0.953824, 1.004765, 0.771115, 1.082255, 1.161322, 1.021400, 
0.757474, 0.736266, 0.923406, 0.598005, 0.812629, 0.706870, 1.011984, 0.968135, 
0.682813, 1.034818, 1.039625, 0.937088, 1.152792, 1.121475; 0.734711, 0.845699, 
1.028669, 1.109756, 0.953824, 1.334869, 1.091564, 0.824709, 1.157992, 1.226413, 
1.045522, 0.731169, 0.709382, 0.980030, 0.634635, 0.853988, 0.758256, 1.070744, 
0.997542, 0.692832, 1.118828, 1.119519, 0.977618, 1.202464, 1.152186; 0.835164, 
0.899780, 1.091437, 1.181625, 1.004765, 1.091564, 1.566492, 0.994066, 1.307932, 
1.301575, 1.075206, 0.697477, 0.670553, 1.068449, 0.725819, 0.908391, 0.877151, 
1.083915, 0.993647, 0.735423, 1.202928, 1.275592, 1.040266, 1.242863, 1.150213; 
0.723973, 0.709264, 0.849192, 0.944176, 0.771115, 0.824709, 0.994066, 1.291185, 
1.084907, 0.934737, 0.802436, 0.559117, 0.510659, 0.801938, 0.624240, 0.682461, 
0.761152, 0.633965, 0.633248, 0.622034, 0.826798, 1.084260, 0.777897, 0.868127, 
0.764337; 0.822436, 0.960949, 1.204776, 1.220418, 1.082255, 1.157992, 1.307932, 
1.084907, 1.821667, 1.386725, 1.220398, 0.710929, 0.683088, 1.116057, 0.719380, 
0.925482, 0.907590, 1.010205, 1.041999, 0.673723, 1.224967, 1.373886, 1.082813, 
1.261720, 1.130003; 0.852591, 1.021108, 1.249037, 1.328803, 1.161322, 1.226413, 
1.301575, 0.934737, 1.386725, 1.831419, 1.289097, 0.885418, 0.868685, 1.190024, 
0.740180, 1.034238, 0.884778, 1.342867, 1.252762, 0.811797, 1.374303, 1.319088, 
1.190385, 1.478896, 1.427420; 0.699979, 0.896508, 1.115160, 1.156144, 1.021400, 
1.045522, 1.075206, 0.802436, 1.220398, 1.289097, 1.501713, 0.831799, 0.807718, 
0.994071, 0.599941, 0.876646, 0.730630, 1.074102, 1.06, 0.677923, 1.126406, 
1.127440, 1.013104, 1.257194, 1.217591; 0.609533, 0.692635, 0.815680, 0.984967, 
0.757474, 0.731169, 0.697477, 0.559117, 0.710929, 0.885418, 0.831799, 1.206659, 
0.820404, 0.673313, 0.477307, 0.698907, 0.529141, 0.853290, 0.812957, 0.721698, 
0.775190, 0.790003, 0.746242, 0.994816, 1.053857; 0.559504, 0.659746, 0.772715, 
0.916500, 0.736266, 0.709382, 0.670553, 0.510659, 0.683088, 0.868685, 0.807718, 
0.820404, 1.103586, 0.666515, 0.455429, 0.667077, 0.500250, 0.855895, 0.807716, 
0.674035, 0.759131, 0.732206, 0.732658, 0.965557, 1.021876; 0.708015, 0.809355, 
0.971263, 1.046903, 0.923406, 0.980030, 1.068449, 0.801938, 1.116057, 1.190024, 
0.994071, 0.673313, 0.666515, 1.293629, 0.631919, 0.822011, 0.745899, 1.060809, 
0.965440, 0.672388, 1.090059, 1.069719, 0.959136, 1.162012, 1.109763; 0.544744, 
0.539466, 0.621197, 0.728221, 0.598005, 0.634635, 0.725819, 0.624240, 0.719380, 
0.740180, 0.599941, 0.477307, 0.455429, 0.631919, 0.802572, 0.551340, 0.549393, 
0.654652, 0.567237, 0.531461, 0.684655, 0.749979, 0.627718, 0.744096, 0.710310; 
0.658359, 0.730501, 0.875925, 0.991042, 0.812629, 0.853988, 0.908391, 0.682461, 
0.925482, 1.034238, 0.876646, 0.698907, 0.667077, 0.822011, 0.551340, 1.075722, 
0.642962, 0.955470, 0.862026, 0.658120, 0.955639, 0.944118, 0.835790, 1.057347, 
1.042521; 0.632510, 0.639971, 0.757031, 0.855095, 0.706870, 0.758256, 0.877151, 
0.761152, 0.907590, 0.884778, 0.730630, 0.529141, 0.500250, 0.745899, 0.549393, 
0.642962, 0.976258, 0.726041, 0.658693, 0.579842, 0.811285, 0.917944, 0.731996, 
0.859613, 0.798469; 0.751316, 0.881502, 1.034483, 1.181719, 1.011984, 1.070744, 
1.083915, 0.633965, 1.010205, 1.342867, 1.074102, 0.853290, 0.855895, 1.060809, 
0.654652, 0.955470, 0.726041, 1.779551, 1.210339, 0.817870, 1.289494, 1.030990, 
1.082950, 1.419900, 1.451896; 0.653993, 0.840159, 1.022001, 1.095485, 0.968135, 
0.997542, 

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-30 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191245#comment-14191245
 ] 

Debasish Das commented on SPARK-3987:
-

NNLS iters 36 result 
0.13010360222362655,0.1268399356245685,0.07256472682635416,0.15415258739697485,0.14472814692821925,0.12993720335014108,0.12116579552952525,0.16145040854270917,0.19919730253363563,0.18716812848138634,0.1594670311402431,0.1442692338314524,0.11740410727778867,0.10929848737016828,0.08690057753031168,0.22139114605899224,0.0,0.17404384335673376,0.16208039794069887,0.04543896291399707

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-30 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191277#comment-14191277
 ] 

Debasish Das commented on SPARK-3987:
-

Was there more changes that step size in your checkin ? I still have not 
updated my branch...changed the step size and ran...Let me update the branch 
and re-run full tests...

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
 9735.061160, -45360.674033, 10634.633559, 0.00, -11652.364691, 
 15039.566630, 

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-30 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191387#comment-14191387
 ] 

Debasish Das commented on SPARK-3987:
-

[~mengxr] this came out of an internal dataset while running ALS...I can't 
point to the dataset...I might not have got all the changes so I am updating my 
branch and re-run this validation over the dataset...Basically I check cases 
where NNLS and QuadraticMinimization don't match and dump them out for further 
analysis...

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-22 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180001#comment-14180001
 ] 

Debasish Das commented on SPARK-3987:
-

I will test it but this is how I called NNLS...assuming P2 and q2 are jblas 
matrices as mentioned up...

val nnlsResult2 = NNLS.solve(P2, q2.mul(-1), ws)
println(sNNLS iters ${ws.iterations} result 
${nnlsResult2.toList.mkString(,)})

val (posResult2, posConverged2) = qpIters.solve(P2, q2)
println(sQuadraticMinimizer iters ${qpIters.iterations} result 
${posResult2.toString()})

I did multiply q2 with -1 before it goes to NNLS...

Is that the right way to call NNLS ?

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang

 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, 

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-22 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180314#comment-14180314
 ] 

Debasish Das commented on SPARK-3987:
-

[~coderxiang] changing to 1e-6 to 1e-7 fixes this error but why bound step at 
1e-6 or 1e-7...why not let the algorithm converge using the standard techniques 
used in BFGS / OWLQN (gradient norm and last 5 iterates)...you can also extend 
Breeze linear CG to implement NNLS...

NNLS iters 36 result 
0.13010360222362655,0.1268399356245685,0.07256472682635416,0.15415258739697485,0.14472814692821925,0.12993720335014108,0.12116579552952525,0.16145040854270917,0.19919730253363563,0.18716812848138634,0.1594670311402431,0.1442692338314524,0.11740410727778867,0.10929848737016828,0.08690057753031168,0.22139114605899224,0.0,0.17404384335673376,0.16208039794069887,0.04543896291399707

QuadraticMinimizer iters 259 result [0.130104; 0.126840; 0.072565; 0.154153; 
0.144728; 0.129937; 0.121166; 0.161450; 0.199197; 0.187168; 0.159467; 0.144269; 
0.117404; 0.109298; 0.086901; 0.221391; 0.00; 0.174044; 0.162080; 0.045439]

I will run more tests...Basically based on this PR I am testing 
https://issues.apache.org/jira/browse/SPARK-2426 userFeatures as SMOOTH and 
productFeatures as POSITIVE...and this particular case showed up there...For 
POSITIVE I use both NNLS and QuadraticMinimizer to see the iterations and 
optimize on that...

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang

 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-19 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176358#comment-14176358
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] I thought more on it and one of the reason we choose ADMM because 
QuadraticMinimizer is not designed to be a local algorithm

If it runs on Spark Master it will take a RDD...If it runs on Spark worker, it 
will take a H and c from x'Hx + c'x along with proximal operators...

I will update the API and show some POCs that how this meta algorithm will add 
LBFGS/Truncated Newton as a core solver for x-solve for scalable version of 
matrix factorization where we don't want to create the H matrix explicitly 
ever...

Truncated Newton is a better choice for the constraints we want to support...I 
am working on a variant of TRON and linear CG that's in breeze for the scalable 
version..Those are the building blocks I need...

I am sure some of the code will move to Breeze. Proximal will definitely move 
to Breeze but QuadraticMinimizer will be refactored. It will really help if you 
can open up a PR on the new ALS design you have and we can work on it...

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-17 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175167#comment-14175167
 ] 

Debasish Das commented on SPARK-2426:
-

1. [~mengxr] Our legal was clear that Stanford and Verizon copyright should 
show up on the COPYRIGHT.txt file...I saw other company's copyrights and I did 
not think it will be a big issue...

2. For the new interface, we have two more requirements: Convex loss function 
(supporting huber loss / hinge loss etc) and no explicit AtA construction since 
once we start scaling to 1 factors for LSA then AtA construction will start 
to choke...Can I work on your branch ? 
https://github.com/mengxr/spark-als/blob/master/src/main/scala/org/apache/spark/ml/SimpleALS.scala

3. I agree to refactor the core solver including NNLS to breeze. That was the 
initial plan but since we wanted to test out the features in our internal 
datasets, integrating with mllib was faster. I am testing NNLS's CG 
implementation since as soon as explicit AtA construction is taken out, we need 
to rely on CG in-place of direct solvers...But I will refactor the solver out 
to breeze and that will take the copyright msgs to breeze as well.

4. Let me add the Matlab scripts and point to the repository. ECOS and MOSEK 
will need Matlab to run. PDCO and Proximal variants will run fine on Octave. I 
am not sure if MOSEK is supported on Octave.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3987) NNLS generates incorrect result

2014-10-16 Thread Debasish Das (JIRA)
Debasish Das created SPARK-3987:
---

 Summary: NNLS generates incorrect result
 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das


Hi,

Please see the example gram matrix and linear term:

val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 207935.829941, 
-162881.367739, -43730.396770, 17511.428983, -243340.496449, -225245.957922, 
104700.445881, 32430.845099, 336378.693135, -373497.970207, -41147.159621, 
53928.060360, -293517.883778, 53105.278068, 0.00, -85257.781696, 
84913.970469, -10584.080103, -60814.043975, 13826.806664, -38032.612640, 
33475.833875, 10791.916809, -1040.950810, 48106.552472, 45390.073380, 
-16310.282190, -2861.455903, -60790.833191, 73109.516544, 9826.614644, 
-8283.992464, 56991.742991, -6171.366034, 0.00, 19152.382499, 
-13218.721710, 2793.734234, 207935.829941, -38032.612640, 129661.677608, 
-101682.098412, -27401.299347, 10787.713362, -151803.006149, -140563.601672, 
65067.935324, 20031.263383, 209521.268600, -232958.054688, -25764.179034, 
33507.951918, -183046.845592, 32884.782835, 0.00, -53315.811196, 
52770.762546, -6642.187643, -162881.367739, 33475.833875, -101682.098412, 
85094.407608, 25422.850782, -5437.646141, 124197.166330, 116206.265909, 
-47093.484134, -11420.168521, -163429.436848, 189574.783900, 23447.172314, 
-24087.375367, 148311.355507, -20848.385466, 0.00, 46835.814559, 
-38180.352878, 6415.873901, -43730.396770, 10791.916809, -27401.299347, 
25422.850782, 8882.869799, 15.638084, 35933.473986, 34186.371325, 
-10745.330690, -974.314375, -43537.709621, 54371.010558, 7894.453004, 
-5408.929644, 42231.381747, -3192.010574, 0.00, 15058.753110, -8704.757256, 
2316.581535, 17511.428983, -1040.950810, 10787.713362, -5437.646141, 15.638084, 
2794.949847, -9681.950987, -8258.171646, 7754.358930, 4193.359412, 
18052.143842, -15456.096769, -253.356253, 4089.672804, -12524.380088, 
5651.579348, 0.00, -1513.302547, 6296.461898, 152.427321, -243340.496449, 
48106.552472, -151803.006149, 124197.166330, 35933.473986, -9681.950987, 
182931.600236, 170454.352953, -72361.174145, -19270.461728, -244518.179729, 
279551.060579, 33340.452802, -37103.267653, 219025.288975, -33687.141423, 
0.00, 67347.950443, -58673.009647, 8957.800259, -225245.957922, 
45390.073380, -140563.601672, 116206.265909, 34186.371325, -8258.171646, 
170454.352953, 159322.942894, -66074.960534, -16839.743193, -226173.967766, 
260421.044094, 31624.194003, -33839.612565, 203889.695169, -30034.828909, 
0.00, 63525.040745, -53572.741748, 8575.071847, 104700.445881, 
-16310.282190, 65067.935324, -47093.484134, -10745.330690, 7754.358930, 
-72361.174145, -66074.960534, 35869.598076, 13378.653317, 106033.647837, 
-111831.682883, -10455.465743, 18537.392481, -88370.612394, 20344.288488, 
0.00, -22935.482766, 29004.543704, -2409.461759, 32430.845099, 
-2861.455903, 20031.263383, -11420.168521, -974.314375, 4193.359412, 
-19270.461728, -16839.743193, 13378.653317, 6802.081898, 33256.395091, 
-30421.985199, -1296.785870, 7026.518692, -24443.378205, 9221.982599, 0.00, 
-4088.076871, 10861.014242, -25.092938, 336378.693135, -60790.833191, 
209521.268600, -163429.436848, -43537.709621, 18052.143842, -244518.179729, 
-226173.967766, 106033.647837, 33256.395091, 339200.268106, -375442.716811, 
-41027.594509, 54636.778527, -295133.248586, 54177.278365, 0.00, 
-85237.666701, 85996.957056, -10503.209968, -373497.970207, 73109.516544, 
-232958.054688, 189574.783900, 54371.010558, -15456.096769, 279551.060579, 
260421.044094, -111831.682883, -30421.985199, -375442.716811, 427793.208465, 
50528.074431, -57375.986301, 335203.382015, -52676.385869, 0.00, 
102368.307670, -90679.792485, 13509.390393, -41147.159621, 9826.614644, 
-25764.179034, 23447.172314, 7894.453004, -253.356253, 33340.452802, 
31624.194003, -10455.465743, -1296.785870, -41027.594509, 50528.074431, 
7255.977434, -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
-8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
-24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
9735.061160, -45360.674033, 10634.633559, 0.00, -11652.364691, 
15039.566630, -1202.539106, -293517.883778, 56991.742991, -183046.845592, 
148311.355507, 42231.381747, -12524.380088, 219025.288975, 203889.695169, 
-88370.612394, -24443.378205, -295133.248586, 335203.382015, 39298.355527, 
-45360.674033, 262923.925938, -42012.606885, 0.00, 79810.919951, 
-71657.856143, 10464.327491, 53105.278068, -6171.366034, 32884.782835, 
-20848.385466, -3192.010574, 5651.579348, -33687.141423, -30034.828909, 
20344.288488, 9221.982599, 54177.278365, -52676.385869, -3440.450858, 

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
 ] 

Debasish Das commented on SPARK-2426:
-

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all 
QuadraticMinimization. 

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in 
mllib updater...L1Updater is an example of Proximal algorithm.

I feel we should move NNLS into QuadraticMinimizer as well and clean ALS.scala 
as you have suggested before...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based 
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are 
high...I need a different variant of CG for Formulation 4 so NNLS CG is not 
sufficient for all the formulations.

Right now I am experimenting with ADMM rho and lambda values so that the NNLS 
iterations are at par with Least square with positivity. 

I will publish results from the comparisons.

I will also publish comparisons with PDCO, ECOS (IPM) and MOSEK with ADMM 
variants used in this branch...

For recommendation use-case, I expect to produce Jellylish L1 ball projection 
results on netflix/movielens dataset using Formulation 5.

Thanks.
Deb

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add ADMM and IPM based QuadraticMinimization solvers to 
 breeze.optimize.quadratic package.
 2. Add a QpSolver object in spark mllib optimization which calls breeze
 3. Add the QpSolver object in spark mllib ALS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-2426:


Description: 
Current ALS supports least squares and nonnegative least squares.

I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
the following ALS problems:

1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds

Initial runtime comparisons are presented at Spark Summit. 

http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark

Based on Xiangrui's feedback I am currently comparing the ADMM based Quadratic 
Minimization solvers with IPM based QpSolvers and the default ALS/NNLS. I will 
keep updating the runtime comparison results.

For integration the detailed plan is as follows:

1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
2. Integrate QuadraticMinimizer in mllib ALS


  was:
Current ALS supports least squares and nonnegative least squares.

I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
the following ALS problems:

1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds

Initial runtime comparisons are presented at Spark Summit. 

http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark

Based on Xiangrui's feedback I am currently comparing the ADMM based Quadratic 
Minimization solvers with IPM based QpSolvers and the default ALS/NNLS. I will 
keep updating the runtime comparison results.

For integration the detailed plan is as follows:

1. Add ADMM and IPM based QuadraticMinimization solvers to 
breeze.optimize.quadratic package.
2. Add a QpSolver object in spark mllib optimization which calls breeze
3. Add the QpSolver object in spark mllib ALS



 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
 ] 

Debasish Das edited comment on SPARK-2426 at 8/13/14 3:31 PM:
--

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week. 

I need some advice on whether we should bring the additional ALS features first 
or integrate NNLS with QuadraticMinimizer so that we can handle large ranks as 
well. 

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all 
QuadraticMinimization. 

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in 
mllib updater...L1Updater in mllib is an example of Proximal algorithm...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based 
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are 
high...I need a different variant of CG for Formulation 4 so NNLS CG is not 
sufficient for all the formulations this branch supports...

Right now I am experimenting with ADMM rho and lambda values so that the NNLS 
iterations are at par with Least square with positivity. The idea for rho and 
lambda tuning are the following:

1. Derive an optimal value of lambda for quadratic problems, similar to idea of 
Nesterov's acceleration being used in algorithms like FISTA and accelerated 
ADMM from UCLA
2. Equilibrate/Scale the gram matrix such that rho can always be set at 1.0 

For Matlab based experiments within PDCO, ECOS(IPM), MOSEK and ADMM variants, 
ADMM is faster with producing result quality within 1e-4 of MOSEK. I will 
publish the numbers and the matlab script through the ECOS jnilib open source 
(GPL licensed). I did not add any of ECOS code here so that everything stays 
Apache.

For recommendation use-case, I expect to produce Jellylish L1 ball projection 
results on netflix/movielens dataset using Formulation 5.

Example runs:

Least square with equality and positivity for topic modeling, all topics sum to 
1.0:

bin/spark-submit --class org.apache.spark.examples.mllib.MovieLensALS \
  |  examples/target/scala-*/spark-examples-*.jar \
  |  --rank 25 --numIterations 10 --lambda 1.0 --kryo --qpProblem 4\
  |  data/mllib/sample_movielens_data.txt

Least square with L1 regularization:

bin/spark-submit --class org.apache.spark.examples.mllib.MovieLensALS \
  |  examples/target/scala-*/spark-examples-*.jar \
  |  --rank 25 --numIterations 10 --lambda 1.0 --lambdaL1 1e-2 --kryo 
--qpProblem 5\
  |  data/mllib/sample_movielens_data.txt

Thanks.
Deb


was (Author: debasish83):
Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all 
QuadraticMinimization. 

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in 
mllib updater...L1Updater is an example of Proximal algorithm.

I feel we should move NNLS into QuadraticMinimizer as well and clean ALS.scala 
as you have suggested before...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based 
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are 
high...I need a different variant of CG for Formulation 4 so NNLS CG is not 
sufficient for all the formulations.

Right now I am experimenting with ADMM rho and lambda values so that the NNLS 
iterations are at par with Least square with positivity. 

I will publish results from the comparisons.

I will also publish comparisons with PDCO, ECOS (IPM) and MOSEK with ADMM 
variants used in this branch...

For recommendation use-case, I expect to produce Jellylish L1 ball projection 
results on netflix/movielens dataset using Formulation 5.

Thanks.
Deb

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares 

[jira] [Commented] (SPARK-2602) sbt/sbt test steals window focus on OS X

2014-07-20 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068094#comment-14068094
 ] 

Debasish Das commented on SPARK-2602:
-

CDH5 does not even support java6 anymore !

 sbt/sbt test steals window focus on OS X
 

 Key: SPARK-2602
 URL: https://issues.apache.org/jira/browse/SPARK-2602
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Nicholas Chammas
Priority: Minor

 On OS X, I run {{sbt/sbt test}} from Terminal and then go off and do 
 something else with my computer. It appears that there are several things in 
 the test suite that launch Java programs that, for some reason, steal window 
 focus. 
 It can get very annoying, especially if you happen to be typing something in 
 a different window, to be suddenly teleported to a random Java application 
 and have your finely crafted keystrokes be sent where they weren't intended.
 It would be nice if {{sbt/sbt test}} didn't do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


<    1   2