At the moment, the down sampling is done by PreparePreferenceMatrixJob
for the collaborative filtering functionality. We just want to move it
down to RowSimilarityJob to enable standalone usage.

I think that the CrossRecommender should be the next thing on our
agenda, after we have the deployment infrastructure.  I especially like
that its capable to include different kinds of interactions, as opposed
to most other (academically motivated) recommenders that focus on a
single interaction type like a rating.

--sebastian

On 22.07.2013 02:14, Ted Dunning wrote:
> The row similarity downsampling is just a matter of dropping elements at
> random from rows that have more data than we want.
> 
> If the join that puts the row together can handle two kinds of input, then
> RowSimilarity can be easily modified to be CrossRowSimilarity.  Likewise,
> if we have two DRM's with the same row id's in the same order, we can do a
> map-side merge.  Such a merge can be very efficient on a system like MapR
> where you can control files to live on the same nodes.
> 
> 
> On Sun, Jul 21, 2013 at 4:43 PM, Pat Ferrel <[email protected]> wrote:
> 
>> RowSimilarity downsampling? Are you referring to the a mod of the matrix
>> multiply to do cross similarity with LLR for the cross recommendations? So
>> similarity of rows of B with rows of A?
>>
>> Sounds like you are proposing not only putting a recommender in Solr but
>> also a cross-recommender? This is why getting a real data set is
>> problematic?
>>
>> On Jul 21, 2013, at 3:40 PM, Ted Dunning <[email protected]> wrote:
>>
>> Pat,
>>
>> Yes.  The first part probably just is the RowSimilarity job, especially
>> after Sebastian puts in the down-sampling.
>>
>> The new part is exactly as you say, storing the DRM into Solr indexes.
>>
>> There is no reason to not use a real data set.  There is a strong reason to
>> use a synthetic dataset, however, in that it can be trivially scaled up and
>> down both in items and users.  Also, the synthetic dataset doesn't require
>> that the real data be found and downloaded.
>>
>>
>>
>> On Sun, Jul 21, 2013 at 2:17 PM, Pat Ferrel <[email protected]> wrote:
>>
>>> Read the paper, and the preso.
>>>
>>> As to the 'offline to Solr' part. It sounds like you are suggesting an
>>> item item similarity matrix be stored and indexed in Solr. One would have
>>> to create the action matrix from user profile data (preference history),
>> do
>>> a rowsimiarity job on it (using LLR similarity) and move the result to
>>> Solr. The first part of this is nearly identical to the current
>> recommender
>>> job workflow and could pretty easily be created from it I think. The new
>>> part is taking the DistributedRowMatrix and storing it in a particular
>> way
>>> in Solr, right?
>>>
>>> BTW Is there some reason not to use an existing real data set?
>>>
>>> On Jul 19, 2013, at 3:45 PM, Ted Dunning <[email protected]> wrote:
>>>
>>> OK.  I think the crux here is the off-line to Solr part so let's see who
>>> else pops up.
>>>
>>> Having a solr maven could be very helpful.
>>>
>>>
>>>
>>
>>
> 

Reply via email to