Re: seq2sparse and lsi fold-in

Jake Mannix Thu, 06 Jan 2011 13:46:08 -0800

Dmitriy,

  I'm not sure if you figured this out on your own and I didn't see the
email,
but if not:

On Thu, Dec 30, 2010 at 3:57 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Also, if i have a bunch of new documents to fold-in, it looks like i'd need
> to run a matrix multiplication job between new document vectors and V, both
> matrices represented row-wise. So DistributedRowMatrix should help me,
> shouldn't it? do i need to transpose the first matrix first?
>

If you have a dense matrix V of eigenvectors (ie, it has K (a small number
like 100's) rows of dense vectors, each of which are cardinality M (which
may large)), which is a DistributedRowMatrix, and you have your original
document matrix C, which has N rows, each of which has cardinality M, then
you actually need to take the transpose of *both* matrices, then take
the DistributedRowMatrix.times() on these:

  V_transpose = V.transpose();
  C_transpose = C.transpose();
  C_times_V_transpose = C_transpose.times(V_transpose);

This code will yield the mathematical result of C * V^T, which is probably
what you want.

(it turns out that this set of operations could also be done in a custom
operation
using the row-paths of both V and C as inputs, but you'd still require two
MapReduce shuffles to get the answer, so it's not really a savings to do
this).

  -jake

Re: seq2sparse and lsi fold-in

Reply via email to