Dmitriy, I'm not sure if you figured this out on your own and I didn't see the email, but if not:
On Thu, Dec 30, 2010 at 3:57 PM, Dmitriy Lyubimov <[email protected]> wrote: > Also, if i have a bunch of new documents to fold-in, it looks like i'd need > to run a matrix multiplication job between new document vectors and V, both > matrices represented row-wise. So DistributedRowMatrix should help me, > shouldn't it? do i need to transpose the first matrix first? > If you have a dense matrix V of eigenvectors (ie, it has K (a small number like 100's) rows of dense vectors, each of which are cardinality M (which may large)), which is a DistributedRowMatrix, and you have your original document matrix C, which has N rows, each of which has cardinality M, then you actually need to take the transpose of *both* matrices, then take the DistributedRowMatrix.times() on these: V_transpose = V.transpose(); C_transpose = C.transpose(); C_times_V_transpose = C_transpose.times(V_transpose); This code will yield the mathematical result of C * V^T, which is probably what you want. (it turns out that this set of operations could also be done in a custom operation using the row-paths of both V and C as inputs, but you'd still require two MapReduce shuffles to get the answer, so it's not really a savings to do this). -jake
