pardon me. the above was OLS example of course. Ridge would require a small mod by introducing regularization rate correction to the main diagonal of the self-squared X
val w = solve (drmX.t %*% drmX + diag(lambda, drmX.ncol), drmX.t %*% y) On Sun, Sep 21, 2014 at 5:52 PM, Dmitriy Lyubimov <[email protected]> wrote: > There are few things going on with DRM. > > First, Hadoop/MapReduce DRM in Mahout is pretty much constrained to its > persistent format on hdfs (row-wise row key/vector pairs). > > When we moved to Scala, this notion received further expansion as one of > the types under governance of R-like DSL and algebraic optimizer of such > algebraic expressions. E.g. distributed ridge regression solution under > such DSL for dataset represented by tall and skinny matrix X would look > something like this: > > val drmX = drmFromHdfs("X") > val y = .. (y observation vector) > > val w = solve (drmX.t %*% drmX, drmX.t %*% y) > > Finally, algebraic optimizer optimizes execution plan for a particular > engine, one of them being Spark's RDDs. Mahout RDDs in their checkpoint > format (e.g. fully-formed intermediate RDD result) have dual representation > -- either row-wise (tuples of key, row vectors) or block-wise (array of > keys -> matrix vertical/horizontal block). > > Finally, assuming back engine is Spark's RDDs, it is possible to wrap > certain RDD types into DRM type, and vice versa, get access to checkpoint > rdd (e.g. drmX.rdd automatically creates checkpoint and exports matrix data > as an RDD). > > for further details, i would hope the Mahout/Spark page would make it a > bit more clear. there's also a talk and slides from last mahout meetup > discussing main ideas here. > > -d > > > > > On Sun, Sep 21, 2014 at 3:34 AM, kalmohsen <[email protected]> wrote: > >> I am continuously reading about Mahout, Hadoop, Spark and Scala; willing >> to be able to add value to them. However, I am confused with 2 things: >> Spark RDD and Mahout DRM. >> I do know that spark’s RDD is used while working with Mahout. However, I >> came across some Scala code which is using Mahout DRM or wrapping RDD to >> DRM. >> >> Thus, could anyone clarify the difference between them? >> >> Thanks in advance >> Regards >> >> >> >
