On Wed, Sep 26, 2012 at 4:49 AM, Sigurd Spieckermann < [email protected]> wrote:
> Hi guys, > > I'm trying to understand the way the combiner in Mahout SVD works. ( > https://cwiki.apache.org/MAHOUT/dimensional-reduction.html) As far as I > know from the Mahout math matrix-multiplication implementation, matrix A is > represented by column-vectors, matrix B is represented by row vectors and > an inner join executes an outer product of the columns of A with the rows > of B. All outer products are summed by the combiners and reducers. What I > am wondering about is how a combiner can actually combine multiple outer > products on the same datanode because the join-package requires the data to > be partitioned into unsplittable files. In this case, I understand that one > file contains one column/row of its corresponding matrix. Hence, each map > task receives a column-row-tuple, computes the outer product and emits the > result. This all sounds right, but not the following: > My understanding of Hadoop is that the combiner follows a map task > immediately but one map task produces only a single result so there is > nothing to combine. That part is not true - a mapper may emit more than one key-value pair (and for matrix multiplication, this is true *a fortiori* - there is one int/vector pair emitted per nonzero element of the row being mapped over). > If the combiner could accumulate the results of > multiple map task, I would understand the idea, but from my understanding > and tests, it does not. > > Could anyone clarify the process please? > > Thanks a lot! > Sigurd > -- -jake
