Re: mllib.recommendation Design

2015-03-30 Thread Xiangrui Meng
On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com wrote:
 Hi Xiangrui,

 I am facing some minor issues in implementing Alternating Nonlinear
 Minimization as documented in this JIRA due to the ALS code being in ml
 package: https://issues.apache.org/jira/browse/SPARK-6323

 I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package
 private on mllib. For now I removed private but not sure that's the correct
 way...

We don't expose 3rd-party types in our public APIs. You can either
implement your algorithm under org.apache.spark or copy the
fromBreeze/toBreeze code over.


 I also need to re-use lot of building blocks from ml.ALS and so I am writing
 ALM in ml package...


That sounds okay.

 I thought the plan was to still write core algorithms in mllib and pipeline
 integration in ml...It will be great if you can move the ALS object from ml
 to mllib and that way I can also move ALM to mllib (which I feel is the
 right place)...Of course the Pipeline based flow will stay in ml package...


It breaks compatibility if we move it. I think it should be quite
flexible about where we put the implementation.

 We can decide later if ALM needs to be in recommendation or a better place
 is package called factorization but the idea is that ALM will support MAP
 (and may be KL divergence loss) with sparsity constraints (probability
 simplex and bounds are fine for what I am focused at right now)...


I'm really sorry about the late response on this. It is partially
because that I'm still not sure about whether there exist many
applications that need this feature. Please do list some public work
and help us to understand the need.

 Thanks.
 Deb

 On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com
 wrote:

 There is a usability difference...I am not sure if recommendation.ALS
 would like to add both userConstraint and productConstraint ? GraphLab CF
 for example has it and we are ready to support all the features for modest
 ranks where gram matrices can be made...

 For large ranks I am still working on the code

 On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote:

 The current ALS implementation allow pluggable solvers for
 NormalEquation, where we put CholeskeySolver and NNLS solver. Please
 check the current implementation and let us know how your constraint
 solver would fit. For a general matrix factorization package, let's
 make a JIRA and move our discussion there. -Xiangrui

 On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi,
 
  I am bit confused on the mllib design in the master. I thought that
  core
  algorithms will stay in mllib and ml will define the pipelines over the
  core algorithm but looks like in master ALS is moved from mllib to
  ml...
 
  I am refactoring my PR to a factorization package and I want to build
  it on
  top of ml.recommendation.ALS (possibly extend from
  ml.recommendation.ALS
  since first version will use very similar RDD handling as ALS and a
  proximal solver that's being added to breeze)
 
  https://issues.apache.org/jira/browse/SPARK-2426
  https://github.com/scalanlp/breeze/pull/321
 
  Basically I am not sure if we should merge it with recommendation.ALS
  since
  this is more generic than recommendation. I am considering calling it
  ConstrainedALS where user can specify different constraint for user and
  product factors (Similar to GraphLab CF structure).
 
  I am also working on ConstrainedALM where the underlying algorithm is
  no
  longer ALS but nonlinear alternating minimization with constraints.
  https://github.com/scalanlp/breeze/pull/364
  This will let us do large rank matrix completion where there is no need
  to
  construct gram matrices. I will open up the JIRA soon after getting
  initial
  results
 
  I am bit confused that where should I add the factorization package. It
  will use the current ALS test-cases and I have to construct more
  test-cases
  for sparse coding and PLSA formulations.
 
  Thanks.
  Deb




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: mllib.recommendation Design

2015-03-30 Thread Debasish Das
For alm I have started experimenting with the following:

1. rmse and map improvement from loglikelihood loss over least square loss.

2. Factorization for datasets that are not ratings (basically improvement
over implicit ratings)

3. Sparse topic generation using plsa. We are directly optimizing
likelihood under constraints here and so I feel it will improve upon EM
algorithm. Also the current LDA does not produce sparse topics and ALM
results can augment LDA flow. I am understanding LDA flow to see if the
sparsity and loglikelihood optimization can be added there.

I will understand more as I see the result. I am not sure if it is
supported by public packages like graphlab or scikit but the plsa papers
show interesting results.
 On Mar 30, 2015 2:31 PM, Xiangrui Meng men...@gmail.com wrote:

 On Wed, Mar 25, 2015 at 7:59 AM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi Xiangrui,
 
  I am facing some minor issues in implementing Alternating Nonlinear
  Minimization as documented in this JIRA due to the ALS code being in ml
  package: https://issues.apache.org/jira/browse/SPARK-6323
 
  I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package
  private on mllib. For now I removed private but not sure that's the
 correct
  way...

 We don't expose 3rd-party types in our public APIs. You can either
 implement your algorithm under org.apache.spark or copy the
 fromBreeze/toBreeze code over.

 
  I also need to re-use lot of building blocks from ml.ALS and so I am
 writing
  ALM in ml package...
 

 That sounds okay.

  I thought the plan was to still write core algorithms in mllib and
 pipeline
  integration in ml...It will be great if you can move the ALS object from
 ml
  to mllib and that way I can also move ALM to mllib (which I feel is the
  right place)...Of course the Pipeline based flow will stay in ml
 package...
 

 It breaks compatibility if we move it. I think it should be quite
 flexible about where we put the implementation.

  We can decide later if ALM needs to be in recommendation or a better
 place
  is package called factorization but the idea is that ALM will support MAP
  (and may be KL divergence loss) with sparsity constraints (probability
  simplex and bounds are fine for what I am focused at right now)...
 

 I'm really sorry about the late response on this. It is partially
 because that I'm still not sure about whether there exist many
 applications that need this feature. Please do list some public work
 and help us to understand the need.

  Thanks.
  Deb
 
  On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com
  wrote:
 
  There is a usability difference...I am not sure if recommendation.ALS
  would like to add both userConstraint and productConstraint ? GraphLab
 CF
  for example has it and we are ready to support all the features for
 modest
  ranks where gram matrices can be made...
 
  For large ranks I am still working on the code
 
  On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com
 wrote:
 
  The current ALS implementation allow pluggable solvers for
  NormalEquation, where we put CholeskeySolver and NNLS solver. Please
  check the current implementation and let us know how your constraint
  solver would fit. For a general matrix factorization package, let's
  make a JIRA and move our discussion there. -Xiangrui
 
  On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das 
 debasish.da...@gmail.com
  wrote:
   Hi,
  
   I am bit confused on the mllib design in the master. I thought that
   core
   algorithms will stay in mllib and ml will define the pipelines over
 the
   core algorithm but looks like in master ALS is moved from mllib to
   ml...
  
   I am refactoring my PR to a factorization package and I want to build
   it on
   top of ml.recommendation.ALS (possibly extend from
   ml.recommendation.ALS
   since first version will use very similar RDD handling as ALS and a
   proximal solver that's being added to breeze)
  
   https://issues.apache.org/jira/browse/SPARK-2426
   https://github.com/scalanlp/breeze/pull/321
  
   Basically I am not sure if we should merge it with recommendation.ALS
   since
   this is more generic than recommendation. I am considering calling it
   ConstrainedALS where user can specify different constraint for user
 and
   product factors (Similar to GraphLab CF structure).
  
   I am also working on ConstrainedALM where the underlying algorithm is
   no
   longer ALS but nonlinear alternating minimization with constraints.
   https://github.com/scalanlp/breeze/pull/364
   This will let us do large rank matrix completion where there is no
 need
   to
   construct gram matrices. I will open up the JIRA soon after getting
   initial
   results
  
   I am bit confused that where should I add the factorization package.
 It
   will use the current ALS test-cases and I have to construct more
   test-cases
   for sparse coding and PLSA formulations.
  
   Thanks.
   Deb
 
 
 



Re: mllib.recommendation Design

2015-03-25 Thread Debasish Das
Hi Xiangrui,

I am facing some minor issues in implementing Alternating Nonlinear
Minimization as documented in this JIRA due to the ALS code being in ml
package: https://issues.apache.org/jira/browse/SPARK-6323

I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package
private on mllib. For now I removed private but not sure that's the correct
way...

I also need to re-use lot of building blocks from ml.ALS and so I am
writing ALM in ml package...

I thought the plan was to still write core algorithms in mllib and pipeline
integration in ml...It will be great if you can move the ALS object from ml
to mllib and that way I can also move ALM to mllib (which I feel is the
right place)...Of course the Pipeline based flow will stay in ml package...

We can decide later if ALM needs to be in recommendation or a better place
is package called factorization but the idea is that ALM will support MAP
(and may be KL divergence loss) with sparsity constraints (probability
simplex and bounds are fine for what I am focused at right now)...

Thanks.
Deb

On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das debasish.da...@gmail.com
wrote:

 There is a usability difference...I am not sure if recommendation.ALS
 would like to add both userConstraint and productConstraint ? GraphLab CF
 for example has it and we are ready to support all the features for modest
 ranks where gram matrices can be made...

 For large ranks I am still working on the code

 On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote:

 The current ALS implementation allow pluggable solvers for
 NormalEquation, where we put CholeskeySolver and NNLS solver. Please
 check the current implementation and let us know how your constraint
 solver would fit. For a general matrix factorization package, let's
 make a JIRA and move our discussion there. -Xiangrui

 On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi,
 
  I am bit confused on the mllib design in the master. I thought that core
  algorithms will stay in mllib and ml will define the pipelines over the
  core algorithm but looks like in master ALS is moved from mllib to ml...
 
  I am refactoring my PR to a factorization package and I want to build
 it on
  top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS
  since first version will use very similar RDD handling as ALS and a
  proximal solver that's being added to breeze)
 
  https://issues.apache.org/jira/browse/SPARK-2426
  https://github.com/scalanlp/breeze/pull/321
 
  Basically I am not sure if we should merge it with recommendation.ALS
 since
  this is more generic than recommendation. I am considering calling it
  ConstrainedALS where user can specify different constraint for user and
  product factors (Similar to GraphLab CF structure).
 
  I am also working on ConstrainedALM where the underlying algorithm is no
  longer ALS but nonlinear alternating minimization with constraints.
  https://github.com/scalanlp/breeze/pull/364
  This will let us do large rank matrix completion where there is no need
 to
  construct gram matrices. I will open up the JIRA soon after getting
 initial
  results
 
  I am bit confused that where should I add the factorization package. It
  will use the current ALS test-cases and I have to construct more
 test-cases
  for sparse coding and PLSA formulations.
 
  Thanks.
  Deb





Re: mllib.recommendation Design

2015-02-17 Thread Debasish Das
There is a usability difference...I am not sure if recommendation.ALS would
like to add both userConstraint and productConstraint ? GraphLab CF for
example has it and we are ready to support all the features for modest
ranks where gram matrices can be made...

For large ranks I am still working on the code

On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote:

 The current ALS implementation allow pluggable solvers for
 NormalEquation, where we put CholeskeySolver and NNLS solver. Please
 check the current implementation and let us know how your constraint
 solver would fit. For a general matrix factorization package, let's
 make a JIRA and move our discussion there. -Xiangrui

 On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi,
 
  I am bit confused on the mllib design in the master. I thought that core
  algorithms will stay in mllib and ml will define the pipelines over the
  core algorithm but looks like in master ALS is moved from mllib to ml...
 
  I am refactoring my PR to a factorization package and I want to build it
 on
  top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS
  since first version will use very similar RDD handling as ALS and a
  proximal solver that's being added to breeze)
 
  https://issues.apache.org/jira/browse/SPARK-2426
  https://github.com/scalanlp/breeze/pull/321
 
  Basically I am not sure if we should merge it with recommendation.ALS
 since
  this is more generic than recommendation. I am considering calling it
  ConstrainedALS where user can specify different constraint for user and
  product factors (Similar to GraphLab CF structure).
 
  I am also working on ConstrainedALM where the underlying algorithm is no
  longer ALS but nonlinear alternating minimization with constraints.
  https://github.com/scalanlp/breeze/pull/364
  This will let us do large rank matrix completion where there is no need
 to
  construct gram matrices. I will open up the JIRA soon after getting
 initial
  results
 
  I am bit confused that where should I add the factorization package. It
  will use the current ALS test-cases and I have to construct more
 test-cases
  for sparse coding and PLSA formulations.
 
  Thanks.
  Deb



Re: mllib.recommendation Design

2015-02-17 Thread Xiangrui Meng
The current ALS implementation allow pluggable solvers for
NormalEquation, where we put CholeskeySolver and NNLS solver. Please
check the current implementation and let us know how your constraint
solver would fit. For a general matrix factorization package, let's
make a JIRA and move our discussion there. -Xiangrui

On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das debasish.da...@gmail.com wrote:
 Hi,

 I am bit confused on the mllib design in the master. I thought that core
 algorithms will stay in mllib and ml will define the pipelines over the
 core algorithm but looks like in master ALS is moved from mllib to ml...

 I am refactoring my PR to a factorization package and I want to build it on
 top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS
 since first version will use very similar RDD handling as ALS and a
 proximal solver that's being added to breeze)

 https://issues.apache.org/jira/browse/SPARK-2426
 https://github.com/scalanlp/breeze/pull/321

 Basically I am not sure if we should merge it with recommendation.ALS since
 this is more generic than recommendation. I am considering calling it
 ConstrainedALS where user can specify different constraint for user and
 product factors (Similar to GraphLab CF structure).

 I am also working on ConstrainedALM where the underlying algorithm is no
 longer ALS but nonlinear alternating minimization with constraints.
 https://github.com/scalanlp/breeze/pull/364
 This will let us do large rank matrix completion where there is no need to
 construct gram matrices. I will open up the JIRA soon after getting initial
 results

 I am bit confused that where should I add the factorization package. It
 will use the current ALS test-cases and I have to construct more test-cases
 for sparse coding and PLSA formulations.

 Thanks.
 Deb

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org