Re: Spark Matrix Factorization

Dmitriy Lyubimov Fri, 03 Jan 2014 10:45:06 -0800

On Fri, Jan 3, 2014 at 10:28 AM, Sebastian Schelter <[email protected]> wrote:


> > I wonder if anyone might have recommendation on scala native
> implementation
> > of SVD.
>
> Mahout has a scala implementation of an SVD variant called Stochastic SVD:
>
>
> https://svn.apache.org/viewvc/mahout/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala?view=markup


Mahout also has SVD and Eigen decompositions  mapped to scala as svd() and
eigen(). Unfortunately i have not put it on wiki yet but the summary is
available here https://issues.apache.org/jira/browse/MAHOUT-1297

Mahout also has distributed PCA implementation (which is based on
distributed Stochastic SVD and has a special provisions for sparse matrix
cases). Unfortunately our wiki is in flux now due to migration off
confluence to CMS so the SSVD page has not yet been migrated to CMS so
confluence version is here
https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition


>
> Otherwise, all the major java math libraries (mahout math, jblas,
> commons-math) should provide an implementation that you can use in scala.
>
> --sebastian
>
> > C
> >
> >
> >
> >
> > On Thu, Jan 2, 2014 at 7:06 PM, Ameet Talwalkar <[email protected]
> >wrote:
> >
> >> Hi Deb,
> >>
> >> Thanks for your email.  We currently do not have a DSGD implementation
> in
> >> MLlib. Also, just to clarify, DSGD is not a variant of ALS, but rather a
> >> different algorithm for solving the same the same bi-convex objective
> >> function.
> >>
> >> It would be a good thing to do add, but to the best of my knowledge, no
> >> one is actively working on this right now.
> >>
> >> Also, as you mentioned, the ALS implementation in mllib is more
> >> robust/scalable than the one in spark.examples.
> >>
> >> -Ameet
> >>
> >>
> >> On Thu, Jan 2, 2014 at 3:16 PM, Debasish Das <[email protected]
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I am not noticing any DSGD implementation of ALS in Spark.
> >>>
> >>> There are two ALS implementations.
> >>>
> >>> org.apache.spark.examples.SparkALS does not run on large matrices and
> >>> seems more like a demo code.
> >>>
> >>> org.apache.spark.mllib.recommendation.ALS looks feels more robust
> version
> >>> and I am experimenting with it.
> >>>
> >>> References here are Jellyfish, Twitter's implementation of Jellyfish
> >>> called Scalafish, Google paper called Sparkler and similar idea put
> forward
> >>> by IBM paper by Gemulla et al. (large-scale matrix factorization with
> >>> distributed stochastic gradient descent)
> >>>
> >>> https://github.com/azymnis/scalafish
> >>>
> >>> Are there any plans of adding DSGD in Spark or there are any existing
> >>> JIRA ?
> >>>
> >>> Thanks.
> >>> Deb
> >>>
> >>>
> >>
> >
> >
>
>

Re: Spark Matrix Factorization

Reply via email to