Re: Generating similarity file(s) for item recommender?

Sean Owen Tue, 03 Jul 2012 07:41:24 -0700

I'm not sure if Mridul's suggestion does what you want. Do you want to
recommend items to users? then no, you do not start with item IDs and
recommend to them.

It sounds like your question is how to compute similarity data. The
first answer is that you do not use Hadoop unless you must use Hadoop.

You don't compute it yourself, you let the framework do it with
LogLikelihoodSimilarity. It just happens automatically. You can use
caching, you can use precomputation, but that comes after you decide
that you have too much data to do it all in real-time.

1GB of input data suggests you have a lot of data. Is that tens of
millions of user-item associations? then yes you are not in simple
non-Hadoop land anymore and you need to look at RecommenderJob /
Hadoop. This doesn't have anything to do with FileDataModel or the
non-distributed bits.

To your second point -- this is really what Rescorer does for you,
lets you filter or boost certain results at query time. But this is
part of the non-distributed code. You could try stitching together
some offline similarities from the Hadoop job, and loading them
selectively in memory as part of the real-time Recommender, but it's
going to be a bit dicey to get it to work fast.

I don't mind mentioning that this is exactly the kind of problem I'm
working on in Myrrix (myrrix.com). It does the offline model building
on Hadoop and still lets you do real-time recommendations, with
Rescorer objects if you want. The whole point is to fix up this
"dicey" hard part mentioned above. Might take a look.

On Tue, Jul 3, 2012 at 3:15 PM, Matt Mitchell <[email protected]> wrote:
> Thanks Mridul, I'll try this out. Does getItemIDs return every item id
> from the file in your example?
>
> This kind of leads me to another, related question... I want to have
> my recommender engine recommend items to a user, but the items should
> be from a known set of item ids. For example, if a user is doing a
> search for "gaming system", I only want recommendations for "gaming
> system" items. I was thinking I could feed the recommendation engine a
> set of item IDs that are known to be "gaming systems" as a candidate
> set *when executing that actual recommendation*. Does this make sense?
> If so, do you know how I can do this? I basically want to constrain
> the recommendations to a set of known item IDs at recommendation time.
>
> Thanks again!
>
> - Matt
>
> On Tue, Jul 3, 2012 at 8:01 AM, Mridul Kapoor <[email protected]> wrote:
>>> I'm thinking the session ID (in the cookie) would be used as the user ID.
>>> The events
>>> are tied to product IDs, so these would be used in generating the
>>> preferences.
>>
>>
>> I guess if you consider product-preference on a per session-basis (i.e.
>> only items for which a user expresses preference for, in a single session,
>> are similar to each other, in some way or the other). This way, you would
>> be considering the session-ids as dummy user-ids, which I think should be
>> good.
>>
>>
>> I'd like to eventually run this on Hadoop, but it'd also be nice to know if
>>> there is a way to do this locally, while developing the app, maybe using a
>>> smaller
>>> dataset?
>>>
>>
>> Yes just writing a small offline recommender (made to run on a local
>> machine) should do; you could take a subset of the data, use a
>> FileDataModel, then do something like
>>
>> LongPrimitiveIterator itemIDs = dataModel.getItemIDs();
>>
>>
>> and iterate over these; getting _n_ recommended items for each, storing
>> them somewhere (and maybe use this evaluating the recommender somehow)
>>
>> Best,
>> Mridul

Re: Generating similarity file(s) for item recommender?

Reply via email to