Thank you, Jake. Yes, i have figured that, and it seems that DRM.times does just that. I was just not sure of the production quality of this code. It seems DRM experiences a lot of fixes and discussions lately, including simple multiplication.
On a side node one needs to compute Cx V^t x Sigma^-1 . But i have an option in stochastic svd command line to compute V x Sigma ^ 0.5 instead of V and U x Sigma ^ 0.5 instead of U , in which case correction for singular vectors indeed turns into simple multiplication C x V^t and singular values matrix can be ignored . (esp if one may want to measure similarities between a user and an item, not just user-user or item-item). -d On Thu, Jan 6, 2011 at 1:45 PM, Jake Mannix <[email protected]> wrote: > Dmitriy, > > I'm not sure if you figured this out on your own and I didn't see the > email, > but if not: > > On Thu, Dec 30, 2010 at 3:57 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > Also, if i have a bunch of new documents to fold-in, it looks like i'd > need > > to run a matrix multiplication job between new document vectors and V, > both > > matrices represented row-wise. So DistributedRowMatrix should help me, > > shouldn't it? do i need to transpose the first matrix first? > > > > If you have a dense matrix V of eigenvectors (ie, it has K (a small number > like 100's) rows of dense vectors, each of which are cardinality M (which > may large)), which is a DistributedRowMatrix, and you have your original > document matrix C, which has N rows, each of which has cardinality M, then > you actually need to take the transpose of *both* matrices, then take > the DistributedRowMatrix.times() on these: > > V_transpose = V.transpose(); > C_transpose = C.transpose(); > C_times_V_transpose = C_transpose.times(V_transpose); > > This code will yield the mathematical result of C * V^T, which is probably > what you want. > > (it turns out that this set of operations could also be done in a custom > operation > using the row-paths of both V and C as inputs, but you'd still require two > MapReduce shuffles to get the answer, so it's not really a savings to do > this). > > -jake >
