On Apr 4, 2013, at 5:17 PM, Pat Ferrel wrote: > One issue with the method below is that the two source matrices would not > have values for all users or items (rows or columns). I do know the entire > user and item id space from a previous step so I know the # of rows including > blank ones and # of columns even though some are empty. Put another way the > Actual matrix (with empty rows or columns) may be larger than the number of > rows in the DistributedRowMatrix or unique item ids. However all ids in one > matrix will match the ids of the other matrix.
As I mentioned in the other response I sent, only the user id's need to match. Any item whose column in B is all zero cannot be recommended since we have never seen it. The math won't change. Any item whose column in A is all zero cannot become an indicator. That probably doesn't matter either since an item we have not yet seen is probably rare and in any case, we know nothing about it. Any zero row of either A or B will case that user's behavior to be ignored for the purposes of cross recommendation. That is, again, as it should be since any user who has not participated in both behaviors cannot provide information about the linkage. > > AFAICT this should be OK for the TransposeJob and MatrixMultJob but I haven't > tested it. I will need to pass in the size of the matrices as the size of the > user and item space, Correct? Yes.
