Re: Spark Matrix Factorization

Dmitriy Lyubimov Fri, 03 Jan 2014 11:07:09 -0800

it's in Mahout - 0.9. It should be in very final stages now.


On Fri, Jan 3, 2014 at 10:51 AM, Debasish Das <[email protected]>wrote:

> Hi Dmitri,
>
> We have a mahout mirror from github but I don't see any of the math-scala
> code.
>
> Where do I see the math-scala code ? I thought github mirror is updated
> with svn repo.
>
> Thanks.
> Deb
>
>
>
> On Fri, Jan 3, 2014 at 10:43 AM, Dmitriy Lyubimov <[email protected]>wrote:
>
>>
>>
>>
>> On Fri, Jan 3, 2014 at 10:28 AM, Sebastian Schelter <[email protected]>wrote:
>>
>>> > I wonder if anyone might have recommendation on scala native
>>> implementation
>>> > of SVD.
>>>
>>> Mahout has a scala implementation of an SVD variant called Stochastic
>>> SVD:
>>>
>>>
>>> https://svn.apache.org/viewvc/mahout/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala?view=markup
>>
>>
>> Mahout also has SVD and Eigen decompositions  mapped to scala as svd()
>> and eigen(). Unfortunately i have not put it on wiki yet but the summary is
>> available here https://issues.apache.org/jira/browse/MAHOUT-1297
>>
>> Mahout also has distributed PCA implementation (which is based on
>> distributed Stochastic SVD and has a special provisions for sparse matrix
>> cases). Unfortunately our wiki is in flux now due to migration off
>> confluence to CMS so the SSVD page has not yet been migrated to CMS so
>> confluence version is here
>> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
>>
>>
>>>
>>> Otherwise, all the major java math libraries (mahout math, jblas,
>>> commons-math) should provide an implementation that you can use in scala.
>>>
>>> --sebastian
>>>
>>> > C
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Jan 2, 2014 at 7:06 PM, Ameet Talwalkar <
>>> [email protected]>wrote:
>>> >
>>> >> Hi Deb,
>>> >>
>>> >> Thanks for your email.  We currently do not have a DSGD
>>> implementation in
>>> >> MLlib. Also, just to clarify, DSGD is not a variant of ALS, but
>>> rather a
>>> >> different algorithm for solving the same the same bi-convex objective
>>> >> function.
>>> >>
>>> >> It would be a good thing to do add, but to the best of my knowledge,
>>> no
>>> >> one is actively working on this right now.
>>> >>
>>> >> Also, as you mentioned, the ALS implementation in mllib is more
>>> >> robust/scalable than the one in spark.examples.
>>> >>
>>> >> -Ameet
>>> >>
>>> >>
>>> >> On Thu, Jan 2, 2014 at 3:16 PM, Debasish Das <
>>> [email protected]>wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> I am not noticing any DSGD implementation of ALS in Spark.
>>> >>>
>>> >>> There are two ALS implementations.
>>> >>>
>>> >>> org.apache.spark.examples.SparkALS does not run on large matrices and
>>> >>> seems more like a demo code.
>>> >>>
>>> >>> org.apache.spark.mllib.recommendation.ALS looks feels more robust
>>> version
>>> >>> and I am experimenting with it.
>>> >>>
>>> >>> References here are Jellyfish, Twitter's implementation of Jellyfish
>>> >>> called Scalafish, Google paper called Sparkler and similar idea put
>>> forward
>>> >>> by IBM paper by Gemulla et al. (large-scale matrix factorization with
>>> >>> distributed stochastic gradient descent)
>>> >>>
>>> >>> https://github.com/azymnis/scalafish
>>> >>>
>>> >>> Are there any plans of adding DSGD in Spark or there are any existing
>>> >>> JIRA ?
>>> >>>
>>> >>> Thanks.
>>> >>> Deb
>>> >>>
>>> >>>
>>> >>
>>> >
>>> >
>>>
>>>
>>
>

Re: Spark Matrix Factorization

Reply via email to