Marc, If you want to do element-at-a-time multiplication, without putting both row and column in memory at a time, this is totally doable, but just not implemented in Mahout yet. The current implementation manages to do it in one map-reduce pass by doing a mapside join (the CompositeInputFormat thing), but in general if you don't do a map-side join, it's 2 passes. In which case, doing this element at a time instead of row/column at a time is also 2 passes, and has no restrictions on how much is in memory at a time.
I've had some code lying around which started on doing this, but never had a need just yet. If you open up a JIRA ticket for this, I could post my code fragments so far, and maybe you (or someone else) could help finish it off. Can you describe a bit about how big your matrices are? Dense matrix multiplication is an O(N^3) operation, so if N is too large so that even one row or column cannot fit in memory, then N^3 is not going to finish any time this year or next, from what I can tell. -jake On Sat, Oct 1, 2011 at 3:18 AM, Marc Sturlese <[email protected]>wrote: > Well after digging into the code and do some tests, I've seen that what I > was > asking for is not possible. Mahout will only let you do a distributed > matrix > multiplication of 2 sparse matrix, as the representation of a whole row or > column has to feed in memory. Actually have to feed in memory a row and a > column each time (as it uses the CompositeInputFormat). > To do dense matrix multiplication with hadoop just found this: > http://homepage.mac.com/j.norstad/matrix-multiply/index.html > But the data generated by the maps will be extremely huge and the job will > take ages (of course depending of the number of nodes). > I've seed around that Hama and R are possible solutions too. Any advice, > comment or experience? > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/about-DistributedRowMatrix-implementation-tp3375372p3384669.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
