Re: mllib.recommendation Design
On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, I am facing some minor issues in implementing Alternating Nonlinear Minimization as documented in this JIRA due to the ALS code being in ml package: https://issues.apache.org/jira/browse/SPARK-6323 I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package private on mllib. For now I removed private but not sure that's the correct way... We don't expose 3rd-party types in our public APIs. You can either implement your algorithm under org.apache.spark or copy the fromBreeze/toBreeze code over. I also need to re-use lot of building blocks from ml.ALS and so I am writing ALM in ml package... That sounds okay. I thought the plan was to still write core algorithms in mllib and pipeline integration in ml...It will be great if you can move the ALS object from ml to mllib and that way I can also move ALM to mllib (which I feel is the right place)...Of course the Pipeline based flow will stay in ml package... It breaks compatibility if we move it. I think it should be quite flexible about where we put the implementation. We can decide later if ALM needs to be in recommendation or a better place is package called factorization but the idea is that ALM will support MAP (and may be KL divergence loss) with sparsity constraints (probability simplex and bounds are fine for what I am focused at right now)... I'm really sorry about the late response on this. It is partially because that I'm still not sure about whether there exist many applications that need this feature. Please do list some public work and help us to understand the need. Thanks. Deb On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com wrote: There is a usability difference...I am not sure if recommendation.ALS would like to add both userConstraint and productConstraint ? GraphLab CF for example has it and we are ready to support all the features for modest ranks where gram matrices can be made... For large ranks I am still working on the code On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote: The current ALS implementation allow pluggable solvers for NormalEquation, where we put CholeskeySolver and NNLS solver. Please check the current implementation and let us know how your constraint solver would fit. For a general matrix factorization package, let's make a JIRA and move our discussion there. -Xiangrui On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused on the mllib design in the master. I thought that core algorithms will stay in mllib and ml will define the pipelines over the core algorithm but looks like in master ALS is moved from mllib to ml... I am refactoring my PR to a factorization package and I want to build it on top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS since first version will use very similar RDD handling as ALS and a proximal solver that's being added to breeze) https://issues.apache.org/jira/browse/SPARK-2426 https://github.com/scalanlp/breeze/pull/321 Basically I am not sure if we should merge it with recommendation.ALS since this is more generic than recommendation. I am considering calling it ConstrainedALS where user can specify different constraint for user and product factors (Similar to GraphLab CF structure). I am also working on ConstrainedALM where the underlying algorithm is no longer ALS but nonlinear alternating minimization with constraints. https://github.com/scalanlp/breeze/pull/364 This will let us do large rank matrix completion where there is no need to construct gram matrices. I will open up the JIRA soon after getting initial results I am bit confused that where should I add the factorization package. It will use the current ALS test-cases and I have to construct more test-cases for sparse coding and PLSA formulations. Thanks. Deb - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: mllib.recommendation Design
For alm I have started experimenting with the following: 1. rmse and map improvement from loglikelihood loss over least square loss. 2. Factorization for datasets that are not ratings (basically improvement over implicit ratings) 3. Sparse topic generation using plsa. We are directly optimizing likelihood under constraints here and so I feel it will improve upon EM algorithm. Also the current LDA does not produce sparse topics and ALM results can augment LDA flow. I am understanding LDA flow to see if the sparsity and loglikelihood optimization can be added there. I will understand more as I see the result. I am not sure if it is supported by public packages like graphlab or scikit but the plsa papers show interesting results. On Mar 30, 2015 2:31 PM, Xiangrui Meng men...@gmail.com wrote: On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, I am facing some minor issues in implementing Alternating Nonlinear Minimization as documented in this JIRA due to the ALS code being in ml package: https://issues.apache.org/jira/browse/SPARK-6323 I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package private on mllib. For now I removed private but not sure that's the correct way... We don't expose 3rd-party types in our public APIs. You can either implement your algorithm under org.apache.spark or copy the fromBreeze/toBreeze code over. I also need to re-use lot of building blocks from ml.ALS and so I am writing ALM in ml package... That sounds okay. I thought the plan was to still write core algorithms in mllib and pipeline integration in ml...It will be great if you can move the ALS object from ml to mllib and that way I can also move ALM to mllib (which I feel is the right place)...Of course the Pipeline based flow will stay in ml package... It breaks compatibility if we move it. I think it should be quite flexible about where we put the implementation. We can decide later if ALM needs to be in recommendation or a better place is package called factorization but the idea is that ALM will support MAP (and may be KL divergence loss) with sparsity constraints (probability simplex and bounds are fine for what I am focused at right now)... I'm really sorry about the late response on this. It is partially because that I'm still not sure about whether there exist many applications that need this feature. Please do list some public work and help us to understand the need. Thanks. Deb On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com wrote: There is a usability difference...I am not sure if recommendation.ALS would like to add both userConstraint and productConstraint ? GraphLab CF for example has it and we are ready to support all the features for modest ranks where gram matrices can be made... For large ranks I am still working on the code On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote: The current ALS implementation allow pluggable solvers for NormalEquation, where we put CholeskeySolver and NNLS solver. Please check the current implementation and let us know how your constraint solver would fit. For a general matrix factorization package, let's make a JIRA and move our discussion there. -Xiangrui On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused on the mllib design in the master. I thought that core algorithms will stay in mllib and ml will define the pipelines over the core algorithm but looks like in master ALS is moved from mllib to ml... I am refactoring my PR to a factorization package and I want to build it on top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS since first version will use very similar RDD handling as ALS and a proximal solver that's being added to breeze) https://issues.apache.org/jira/browse/SPARK-2426 https://github.com/scalanlp/breeze/pull/321 Basically I am not sure if we should merge it with recommendation.ALS since this is more generic than recommendation. I am considering calling it ConstrainedALS where user can specify different constraint for user and product factors (Similar to GraphLab CF structure). I am also working on ConstrainedALM where the underlying algorithm is no longer ALS but nonlinear alternating minimization with constraints. https://github.com/scalanlp/breeze/pull/364 This will let us do large rank matrix completion where there is no need to construct gram matrices. I will open up the JIRA soon after getting initial results I am bit confused that where should I add the factorization package. It will use the current ALS test-cases and I have to construct more test-cases for sparse coding and PLSA formulations. Thanks. Deb
Re: mllib.recommendation Design
Hi Xiangrui, I am facing some minor issues in implementing Alternating Nonlinear Minimization as documented in this JIRA due to the ALS code being in ml package: https://issues.apache.org/jira/browse/SPARK-6323 I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package private on mllib. For now I removed private but not sure that's the correct way... I also need to re-use lot of building blocks from ml.ALS and so I am writing ALM in ml package... I thought the plan was to still write core algorithms in mllib and pipeline integration in ml...It will be great if you can move the ALS object from ml to mllib and that way I can also move ALM to mllib (which I feel is the right place)...Of course the Pipeline based flow will stay in ml package... We can decide later if ALM needs to be in recommendation or a better place is package called factorization but the idea is that ALM will support MAP (and may be KL divergence loss) with sparsity constraints (probability simplex and bounds are fine for what I am focused at right now)... Thanks. Deb On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com wrote: There is a usability difference...I am not sure if recommendation.ALS would like to add both userConstraint and productConstraint ? GraphLab CF for example has it and we are ready to support all the features for modest ranks where gram matrices can be made... For large ranks I am still working on the code On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote: The current ALS implementation allow pluggable solvers for NormalEquation, where we put CholeskeySolver and NNLS solver. Please check the current implementation and let us know how your constraint solver would fit. For a general matrix factorization package, let's make a JIRA and move our discussion there. -Xiangrui On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused on the mllib design in the master. I thought that core algorithms will stay in mllib and ml will define the pipelines over the core algorithm but looks like in master ALS is moved from mllib to ml... I am refactoring my PR to a factorization package and I want to build it on top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS since first version will use very similar RDD handling as ALS and a proximal solver that's being added to breeze) https://issues.apache.org/jira/browse/SPARK-2426 https://github.com/scalanlp/breeze/pull/321 Basically I am not sure if we should merge it with recommendation.ALS since this is more generic than recommendation. I am considering calling it ConstrainedALS where user can specify different constraint for user and product factors (Similar to GraphLab CF structure). I am also working on ConstrainedALM where the underlying algorithm is no longer ALS but nonlinear alternating minimization with constraints. https://github.com/scalanlp/breeze/pull/364 This will let us do large rank matrix completion where there is no need to construct gram matrices. I will open up the JIRA soon after getting initial results I am bit confused that where should I add the factorization package. It will use the current ALS test-cases and I have to construct more test-cases for sparse coding and PLSA formulations. Thanks. Deb
Re: mllib.recommendation Design
There is a usability difference...I am not sure if recommendation.ALS would like to add both userConstraint and productConstraint ? GraphLab CF for example has it and we are ready to support all the features for modest ranks where gram matrices can be made... For large ranks I am still working on the code On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote: The current ALS implementation allow pluggable solvers for NormalEquation, where we put CholeskeySolver and NNLS solver. Please check the current implementation and let us know how your constraint solver would fit. For a general matrix factorization package, let's make a JIRA and move our discussion there. -Xiangrui On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused on the mllib design in the master. I thought that core algorithms will stay in mllib and ml will define the pipelines over the core algorithm but looks like in master ALS is moved from mllib to ml... I am refactoring my PR to a factorization package and I want to build it on top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS since first version will use very similar RDD handling as ALS and a proximal solver that's being added to breeze) https://issues.apache.org/jira/browse/SPARK-2426 https://github.com/scalanlp/breeze/pull/321 Basically I am not sure if we should merge it with recommendation.ALS since this is more generic than recommendation. I am considering calling it ConstrainedALS where user can specify different constraint for user and product factors (Similar to GraphLab CF structure). I am also working on ConstrainedALM where the underlying algorithm is no longer ALS but nonlinear alternating minimization with constraints. https://github.com/scalanlp/breeze/pull/364 This will let us do large rank matrix completion where there is no need to construct gram matrices. I will open up the JIRA soon after getting initial results I am bit confused that where should I add the factorization package. It will use the current ALS test-cases and I have to construct more test-cases for sparse coding and PLSA formulations. Thanks. Deb
Re: mllib.recommendation Design
The current ALS implementation allow pluggable solvers for NormalEquation, where we put CholeskeySolver and NNLS solver. Please check the current implementation and let us know how your constraint solver would fit. For a general matrix factorization package, let's make a JIRA and move our discussion there. -Xiangrui On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused on the mllib design in the master. I thought that core algorithms will stay in mllib and ml will define the pipelines over the core algorithm but looks like in master ALS is moved from mllib to ml... I am refactoring my PR to a factorization package and I want to build it on top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS since first version will use very similar RDD handling as ALS and a proximal solver that's being added to breeze) https://issues.apache.org/jira/browse/SPARK-2426 https://github.com/scalanlp/breeze/pull/321 Basically I am not sure if we should merge it with recommendation.ALS since this is more generic than recommendation. I am considering calling it ConstrainedALS where user can specify different constraint for user and product factors (Similar to GraphLab CF structure). I am also working on ConstrainedALM where the underlying algorithm is no longer ALS but nonlinear alternating minimization with constraints. https://github.com/scalanlp/breeze/pull/364 This will let us do large rank matrix completion where there is no need to construct gram matrices. I will open up the JIRA soon after getting initial results I am bit confused that where should I add the factorization package. It will use the current ALS test-cases and I have to construct more test-cases for sparse coding and PLSA formulations. Thanks. Deb - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org