The row similarity downsampling is just a matter of dropping elements at
random from rows that have more data than we want.

If the join that puts the row together can handle two kinds of input, then
RowSimilarity can be easily modified to be CrossRowSimilarity.  Likewise,
if we have two DRM's with the same row id's in the same order, we can do a
map-side merge.  Such a merge can be very efficient on a system like MapR
where you can control files to live on the same nodes.


On Sun, Jul 21, 2013 at 4:43 PM, Pat Ferrel <[email protected]> wrote:

> RowSimilarity downsampling? Are you referring to the a mod of the matrix
> multiply to do cross similarity with LLR for the cross recommendations? So
> similarity of rows of B with rows of A?
>
> Sounds like you are proposing not only putting a recommender in Solr but
> also a cross-recommender? This is why getting a real data set is
> problematic?
>
> On Jul 21, 2013, at 3:40 PM, Ted Dunning <[email protected]> wrote:
>
> Pat,
>
> Yes.  The first part probably just is the RowSimilarity job, especially
> after Sebastian puts in the down-sampling.
>
> The new part is exactly as you say, storing the DRM into Solr indexes.
>
> There is no reason to not use a real data set.  There is a strong reason to
> use a synthetic dataset, however, in that it can be trivially scaled up and
> down both in items and users.  Also, the synthetic dataset doesn't require
> that the real data be found and downloaded.
>
>
>
> On Sun, Jul 21, 2013 at 2:17 PM, Pat Ferrel <[email protected]> wrote:
>
> > Read the paper, and the preso.
> >
> > As to the 'offline to Solr' part. It sounds like you are suggesting an
> > item item similarity matrix be stored and indexed in Solr. One would have
> > to create the action matrix from user profile data (preference history),
> do
> > a rowsimiarity job on it (using LLR similarity) and move the result to
> > Solr. The first part of this is nearly identical to the current
> recommender
> > job workflow and could pretty easily be created from it I think. The new
> > part is taking the DistributedRowMatrix and storing it in a particular
> way
> > in Solr, right?
> >
> > BTW Is there some reason not to use an existing real data set?
> >
> > On Jul 19, 2013, at 3:45 PM, Ted Dunning <[email protected]> wrote:
> >
> > OK.  I think the crux here is the off-line to Solr part so let's see who
> > else pops up.
> >
> > Having a solr maven could be very helpful.
> >
> >
> >
>
>

Reply via email to