On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <[email protected]> wrote:
> Hi. > > I was wondering how useful an in memory sparse matrix multiplier would > be for mahout. In my current project I needed to multiply many large > sparse matrices but submitting hundreds of jobs to the cluster creates > too much overhead. > > I wrote up an implementation of sparse matrix multiplication using > threads which can multiply a 30,000 x 48,0000 matrix by its transpose > in about 5 minutes using 16 cores. Granted this matrix is composed > mostly of 1s, and -1s, (with about 16 elements per row), is this > considered fast? I have seen that my implementation is much faster > than iterating though a matrix naively and would like some input to > whether or not my 5 minute benchmark is by skewed. > about 16 elements per row (and column, I'm guessing?), means if you did this using the algorithm I usually use for matrix multiplication (matrix summing the outer products of the columns), then you're looking at the matrix sum of 48k matrices, each of which only has 16^2 = 256 entries. So it should take only O(256 * 48k) = 12 million multiplications and 12 million additions. This seems like it should be doable in well under a second using this algorithm. On the other hand, if you're doing 30k * 30k = 900M dot products, most of which are zero, but require 16 operations each to discover this, you'll need closer to 15 billion operations. This would be lot slower. -jake > > Many thanks for the input, > Vincent >
