BTW I have this working on trivial data and am in the process of measuring it's results on some real world data. It does a lot with DistributedRowMatix and so I'll be interested to see how it performs with a larger data set.
Does anyone know of a public data set that provides things like views and purchases? On Apr 8, 2013, at 2:31 PM, Ted Dunning <[email protected]> wrote: On Sat, Apr 6, 2013 at 3:26 PM, Pat Ferrel <[email protected]> wrote: > I guess I don't understand this issue. > > In my case both the item ids and user ids of the separate DistributedRow > Matrix will match and I know the size for the entire space from a previous > step where I create id maps. I suppose you are saying the the m/r code > would be super simple if a row of B' and a column of A could be processed > together, which I understand as an optimal implementation. > Well.... rows of B and A should match so columns of B' and rows of A rather than the reverse. > So calculating [B'A] seems like TransposeJob and MultiplyJob and does seem > to work. You loose the ability to substutute different RowSimilarityJob > measures. I assume this creates something like the co-occurrence similairty > measure. But oh, well. Maybe I'll look at that later. > Yes. Exactly. > I also see why you say the two matrices A and B don't have to have the > same size since [B'A]H_v = [B'A]A' so the dimensions will work out as long > as the users dimension is the same throughout. > Yes. All we need is user id match.
