Yes, but one int/vector pair corresponds to the respective column of A multiplied by an element of the respective row of B, correct? So the concatenation of the resulting columns would be outer product of the column of A and the row of B. None of these vectors are summed up but rather the outer products of multiple map tasks are summed up. So what is the job of the combiner here? It would be nice if the combiner could sum up all outer products computed on that datanode, but this is the part I can't see happening in Hadoop. Is the general statement correct that a combiner is only applied to all outputs of a *map task* and that a map task processes all key-value pairs of a split? In this case, there is only one key-value pair per split, right? The int/vector being index and column/row of the matrix.
2012/9/26 Jake Mannix <[email protected]> > On Wed, Sep 26, 2012 at 4:49 AM, Sigurd Spieckermann < > [email protected]> wrote: > > > Hi guys, > > > > I'm trying to understand the way the combiner in Mahout SVD works. ( > > https://cwiki.apache.org/MAHOUT/dimensional-reduction.html) As far as I > > know from the Mahout math matrix-multiplication implementation, matrix A > is > > represented by column-vectors, matrix B is represented by row vectors and > > an inner join executes an outer product of the columns of A with the rows > > of B. All outer products are summed by the combiners and reducers. What I > > am wondering about is how a combiner can actually combine multiple outer > > products on the same datanode because the join-package requires the data > to > > be partitioned into unsplittable files. In this case, I understand that > one > > file contains one column/row of its corresponding matrix. Hence, each map > > task receives a column-row-tuple, computes the outer product and emits > the > > result. > > > This all sounds right, but not the following: > > > > My understanding of Hadoop is that the combiner follows a map task > > immediately but one map task produces only a single result so there is > > nothing to combine. > > > That part is not true - a mapper may emit more than one key-value pair (and > for > matrix multiplication, this is true *a fortiori* - there is one int/vector > pair emitted per > nonzero element of the row being mapped over). > > > > If the combiner could accumulate the results of > > multiple map task, I would understand the idea, but from my understanding > > and tests, it does not. > > > > Could anyone clarify the process please? > > > > Thanks a lot! > > Sigurd > > > > > > -- > > -jake >
