On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[email protected]> wrote:

> Hi.
>
> I was wondering how useful an in memory sparse matrix multiplier would
> be for mahout. In my current project I needed to multiply many large
> sparse matrices but submitting hundreds of jobs to the cluster creates
> too much overhead.
>
> I wrote up an implementation of sparse matrix multiplication using
> threads which can multiply a 30,000 x 48,0000 matrix by its transpose
> in about 5 minutes using 16 cores. Granted this matrix is composed
> mostly of 1s, and -1s, (with about 16 elements per row), is this
> considered fast? I have seen that my implementation is much faster
> than iterating though a matrix naively and would like some input to
> whether or not my 5 minute benchmark is by skewed.
>

about 16 elements per row (and column, I'm guessing?), means if you
did this using the algorithm I usually use for matrix multiplication
(matrix summing the outer products of the columns), then you're
looking at the matrix sum of 48k matrices, each of which only has
16^2 = 256 entries.  So it should take only O(256 * 48k) = 12 million
multiplications and 12 million additions.  This seems like it should
be doable in well under a second using this algorithm.

On the other hand, if you're doing 30k * 30k = 900M dot products,
most of which are zero, but require 16 operations each to discover
this, you'll need closer to 15 billion operations.  This would be lot
slower.

  -jake


>
> Many thanks for the input,
> Vincent
>

Reply via email to