pardon me. the above was OLS example of course. Ridge would require a small
mod by introducing regularization rate correction to the main diagonal of
the self-squared X

val w = solve (drmX.t %*% drmX + diag(lambda, drmX.ncol), drmX.t %*% y)

On Sun, Sep 21, 2014 at 5:52 PM, Dmitriy Lyubimov <[email protected]> wrote:

> There are few things going on with DRM.
>
> First, Hadoop/MapReduce DRM in Mahout is pretty much constrained to its
> persistent format on hdfs (row-wise row key/vector pairs).
>
> When we moved to Scala, this notion received further expansion as one of
> the types under governance of R-like DSL and algebraic optimizer of such
> algebraic expressions. E.g. distributed ridge regression solution under
> such DSL for dataset represented by tall and skinny matrix X would look
> something like this:
>
> val drmX = drmFromHdfs("X")
> val y = .. (y observation vector)
>
> val w = solve (drmX.t %*% drmX, drmX.t %*% y)
>
> Finally, algebraic optimizer optimizes execution plan for a particular
> engine, one of them being Spark's RDDs. Mahout RDDs in their checkpoint
> format (e.g. fully-formed intermediate RDD result) have dual representation
> -- either row-wise (tuples of key, row vectors) or block-wise (array of
> keys -> matrix vertical/horizontal block).
>
> Finally, assuming back engine is Spark's RDDs, it is possible to wrap
> certain RDD types into DRM type, and vice versa, get access to checkpoint
> rdd (e.g. drmX.rdd automatically creates checkpoint and exports matrix data
> as an RDD).
>
> for further details, i would hope the Mahout/Spark page would make it a
> bit more clear. there's also a talk and slides from last mahout meetup
> discussing main ideas here.
>
> -d
>
>
>
>
> On Sun, Sep 21, 2014 at 3:34 AM, kalmohsen <[email protected]> wrote:
>
>> I am continuously reading about Mahout, Hadoop, Spark and Scala; willing
>> to be able to add value to them. However, I am confused with 2 things:
>> Spark RDD and Mahout DRM.
>> I do know that spark’s RDD is used while working with Mahout. However, I
>> came across some Scala code which is using Mahout DRM or wrapping RDD to
>> DRM.
>>
>> Thus, could anyone clarify the difference between them?
>>
>> Thanks in advance
>> Regards
>>
>>
>>
>

Reply via email to