Re: about DistributedRowMatrix implementation

Jake Mannix Sat, 01 Oct 2011 10:42:51 -0700

Marc,

  If you want to do element-at-a-time multiplication, without putting both
row and
column in memory at a time, this is totally doable, but just not
implemented
in Mahout yet.  The current implementation manages to do it in one
map-reduce
pass by doing a mapside join (the CompositeInputFormat thing), but in
general
if you don't do a map-side join, it's 2 passes.  In which case, doing this
element at a time instead of row/column at a time is also 2 passes, and
has no restrictions on how much is in memory at a time.

  I've had some code lying around which started on doing this, but never
had a need just yet.  If you open up a JIRA ticket for this, I could post
my code fragments so far, and maybe you (or someone else) could help
finish it off.

  Can you describe a bit about how big your matrices are?  Dense matrix
multiplication is an O(N^3) operation, so if N is too large so that even
one row or column cannot fit in memory, then N^3 is not going to finish
any time this year or next, from what I can tell.

  -jake

On Sat, Oct 1, 2011 at 3:18 AM, Marc Sturlese <[email protected]>wrote:

> Well after digging into the code and do some tests, I've seen that what I
> was
> asking for is not possible. Mahout will only let you do a distributed
> matrix
> multiplication of 2 sparse matrix, as the representation of a whole row or
> column has to feed in memory. Actually have to feed in memory a row and a
> column each time (as it uses the CompositeInputFormat).
> To do dense matrix multiplication with hadoop just found this:
> http://homepage.mac.com/j.norstad/matrix-multiply/index.html
> But the data generated by the maps will be extremely huge and the job will
> take ages (of course depending of the number of nodes).
> I've seed around that Hama and R are possible solutions too. Any advice,
> comment or experience?
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/about-DistributedRowMatrix-implementation-tp3375372p3384669.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: about DistributedRowMatrix implementation

Reply via email to