subject:"\[jira\] \[Updated\] \(SPARK\-6323\) Large rank matrix factorization with Nonlinear loss and constraints"

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-05-28 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Affects Version/s: (was: 1.4.0)

> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Debasish Das
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss and LoglikelihoodLoss. Most 
> likely we will re-use the Gradient interfaces already defined and implement 
> LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't 
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we 
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which 
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based 
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check. 
> For example we will run with 10K ranks but we will force factors to be 
> 100-sparse.
> This is closely related to Sparse LDA 
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to 
> generalized loss function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-04-27 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-6323:
-
Target Version/s:   (was: 1.5.0)

> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss and LoglikelihoodLoss. Most 
> likely we will re-use the Gradient interfaces already defined and implement 
> LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't 
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we 
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which 
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based 
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check. 
> For example we will run with 10K ranks but we will force factors to be 
> 100-sparse.
> This is closely related to Sparse LDA 
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to 
> generalized loss function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-04-27 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-6323:
-
Target Version/s: 1.5.0  (was: 1.4.0)

> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss and LoglikelihoodLoss. Most 
> likely we will re-use the Gradient interfaces already defined and implement 
> LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't 
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we 
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which 
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based 
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check. 
> For example we will run with 10K ranks but we will force factors to be 
> 100-sparse.
> This is closely related to Sparse LDA 
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to 
> generalized loss function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6323:
-
Fix Version/s: (was: 1.4.0)

> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss and LoglikelihoodLoss. Most 
> likely we will re-use the Gradient interfaces already defined and implement 
> LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't 
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we 
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which 
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based 
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check. 
> For example we will run with 10K ranks but we will force factors to be 
> 100-sparse.
> This is closely related to Sparse LDA 
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as 
> ratings get denser (my understanding is that since we already scaled ALS to 2 
> billion ratings and we will keep sparsity in check, the same 2 billion flow 
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to 
> generalized loss function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-22 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss and LoglikelihoodLoss. Most 
likely we will re-use the Gradient interfaces already defined and implement 
LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and we will keep sparsity in check, the same 2 billion flow 
will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and we will keep sparsity in check, the same 2 billion flow 
will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> htt

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and we will keep sparsity in check, the same 2 billion flow 
will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proxim

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Br

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g ( z ) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> only scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f ( x ) + g ( z )

1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and 
HingeLoss. Most likely we will re-use the Gradient interfaces already defined 
and implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> only scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of ml recommendation to 
generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> only scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand which flow is more suited as 
ratings get denser (my understanding is that since we already scaled ALS to 2 
billion ratings and since we will keep sparsity in check, the same 2 billion 
flow will scale to 10K ranks as well)...

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand the underlying architecture.

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> only scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recomm

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z)

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need 
that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand the underlying architecture.

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f (x) + g (z) 

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine constraint Aeq x = beq , lb <= x <= ub yet. But most likely we don't 
need that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand the underlying architecture.

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> only scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of solv

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-03-13 Thread Debasish Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debasish Das updated SPARK-6323:

Description: 
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f (x) + g (z) 

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine constraint Aeq x = beq , lb <= x <= ub yet. But most likely we don't 
need that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand the underlying architecture.

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.

  was:
Currently ml.recommendation.ALS is optimized for gram matrix generation which 
only scales to modest ranks. The problems that we can solve are in the normal 
equation/quadratic form: 0.5x'Hx + c'x + g(z)

g(z) can be one of the constraints from Breeze proximal library:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

In this PR we will re-use ml.recommendation.ALS design and come up with 
ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
changes, it's straightforward to do it now !

ALM will be capable of solving the following problems: min f(x) + g(z) 

1. Loss function f(x) can be LeastSquareLoss, LoglikelihoodLoss and HingeLoss. 
Most likely we will re-use the Gradient interfaces already defined and 
implement LoglikelihoodLoss

2. Constraints g(z) supported are same as above except that we don't support 
affine constraint Aeq x = beq , lb <= x <= ub yet. But most likely we don't 
need that for ML applications

3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in 
turn uses projection based solver (SPG) or proximal solvers (ADMM) based on 
convergence speed.

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

4. The factors will be SparseVector so that we keep shuffle size in check. For 
example we will run with 10K ranks but we will force factors to be 100-sparse.

This is closely related to Sparse LDA 
https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we 
are not using graph representation here.

As we do scaling experiments, we will understand the underlying architecture.

This JIRA is intended to extend the capabilities of Spark's collaborative 
filtering toolkit to generalized loss function.


> Large rank matrix factorization with Nonlinear loss and constraints
> ---
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Debasish Das
> Fix For: 1.4.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which 
> only scales to modest ranks. The problems that we can solve are in the normal 
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with 
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent 
> changes, it's straightforward to do it now !
> ALM will be capable of s

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

13 matches

Site Navigation

Mail list logo

Footer information