Hi Jake,

Thank you for the detailed explanation; seems like a very clever way to
distribute the matrix multiplication process.

I tried running the transpose job on my T matrix using the following
options:

bin/mahout transpose -i
/asf-mail-archives/mahout-0.4/sparse-1-gram-stem/tfidf-matrix/matrix
--numRows 6076937 --numCols 20444

I'm using a cluster with 8 data nodes (EC2 xlarge instances) with 3 reducers
per node and mapred.child.java.opts=-Xmx4096m. The map tasks completed
within a few minutes but then all of my 24 reducers failed near the 70% mark
with error "Task attempt_201103201840_0004_r_000023_0 failed to report
status for 601 seconds. Killing!" The data node servers stayed at a healthy
load avg below 4 and never paged ...

So I increased the "mapred.task.timeout" Hadoop parameter to 20 minutes
instead of the default 10 minutes and it failed again. The reduce code for
the TransposeJob looks straight-forward, so I'm going to have to dig in
deeper to figure out what's causing the problem.

Cheers,
Tim

On Sat, Mar 19, 2011 at 3:34 PM, Jake Mannix <jake.man...@gmail.com> wrote:

> On Sat, Mar 19, 2011 at 10:32 AM, Timothy Potter <thelabd...@gmail.com>wrote:
>
>> Regarding Jake's comment: " ... you need to run the RowIdJob on these
>> tfidf-vectors first ..."
>>
>> I did this and now have an m x n matrix T (m=6076937, n=20444). My SVD
>> eigenvector matrix E is p x q (p=87, q=20444).
>
>
> Ok, so to help you understand what's going on here, I'm going to go into
> a little of the inner details of what's going on here.
>
> You are right, you have a matrix T, with 6,076,937 rows, and each row has
> 20,444 columns (most of which are zero, and it's represented sparsely, but
> still, they live in a vector space of dimension 20,444).  Similarly, you've
> made an eigenvector matrix, which has 87 rows (ie 87 eigenvectors) and
> each of these rows has exactly 20,444 columns (and most likely, they'll
> all be nonzero, because eigenvectors have no reason to be sparse).
>
> In particular, T and E are represented as *lists of rows*, each row is a
> vector of dimension 20,444.  T has six million of these rows, and E has
> only 87.
>
>
>> So to multiply these two
>> matrices, I need to transpose E so that the number of columns in T equals
>> the number of rows in E (i.e. E^T is q x p) the result of the matrixmult
>> would give me an m x p matrix (m=6076937, p=87).
>>
>
> You're exactly right that you want to multiply T by E^T, because you can't
> compute T * E.
>
> The way it turns out in practice, computing the matrix product of two
> matrices as a map-reduce job is efficiently done as a map-side join on
> two row-based matrices with the same number of rows, and the columns
> are the ones which are different.  In particular, if you take a matrix X
> which
> is represented as a set of numRowsX rows, each of which has numColsX,
> and another matrix with numRowsY == numRowsX, each of which has
> numColsY (!= numColsX), then by summing the outer-products of each
> of the numRowsX pairs of vectors, you get a matrix of with numRowsZ ==
> numColsX, and numColsZ == numColsY (if you instead take the reverse
> outer product of the vector pairs, you can end up with the transpose of
> this
> final result, with numRowsZ == numColsY, and numColsZ == numColsX).
>
> Unfortunately, you have a pair of matrices which have different numbers
> of rows, and the same number of columns, but you want a pair of matrices
> with the same number of rows and (possibly) different numbers of columns.
>
>
>> So I tried to run matrixmult with:
>
> matrixmult --numRowsA 6076937 --numColsA 20444 --numRowsB 20444 --numColsB
>> 87 \
>> --inputPathA
>> /asf-mail-archives/mahout-0.4/sparse-1-gram-stem/tfidf-matrix/matrix \
>> --inputPathB /asf-mail-archives/mahout-0.4/svd/transpose-244
>
>
>> (--inputPathA points to the output of the rowid job)
>>
>> This results in:
>> Exception in thread "main" org.apache.mahout.math.CardinalityException:
>>
>
>
>> In the code, I see the test that row counts must be identical for the two
>>
>> input matrices. Thus, it seems like the job requires me to transpose this
>> large matrix, just to re-transpose it back to it's original form during
>> the
>> multiplication? Or have I missed something crucial again?
>>
>
> You actually need to transpose the input matrix (T), and then re-run with
> T^t
> and E^t (the latter you apparently already have created).
>
> We should really rename the "matrixmultiply" job to be called
> "transposematrixmultiply", because that's what it really does.
>
>   -jake
>

Reply via email to