Re: Performance issue with Item-based Recommendation and User-based Recommendation

Sean Owen Thu, 21 Jun 2012 15:55:44 -0700

OK, you're already pruning a fair bit then, in the sense that you keep
top 50 similarities (by absolute value) per item. More is probably not
productive as you're already keeping only a small fraction of all of
them.

(100M pairs and ~20 bytes needed per pair... should get in about 2GB
of heap. That's a lot of the 4GB you have available but seems like it
ought to about fit? are you giving Java enough heap? here are my
general default settings for this kind of app -- applicable here too:
http://myrrix.com/documentation-serving-layer/)

You just have a load of items. Any process that scales as the square
of the number of items is going to hurt when you get to millions of
them. A process based on user-user similarity, when there are 40M, is
only going to be much worse.

Consider not pre-computing all these pairs. Compute them and cache
them in real-time. Instead use the CandidateItemStrategy to
significantly reduce the number of item-item similarities you need to
look at. That may mitigate the fact that you don't have them all in
memory.

You can throw more hardware at this, if you're willing to move to a
completely batch-oriented Hadoop-based computation. You won't be
limited by RAM but it will be an offline process.

I am a big fan of matrix-factorization-based at the moment since you
can run most of the computation offline whenever you like, but still
make real-time approximate updates. These sorts of things only scale
linearly with the number of items and users, and not even with the
size of the pref input. I think you may have to shoot for this kind of
hybrid system in the end to do updates in real-time.

On Thu, Jun 21, 2012 at 11:26 PM, Way Cool <[email protected]> wrote:
> Thanks guys for your quick response.
>
> We have a couple millions of items and 40 millions users (including
> anonymous users). Up to 50 items were generated per item.
>
> I will try minimum similarity. Is there any document or a parameter defined
> in itemsimilarity job?
>
> What about user-based recommendation? Any ideas how we can make that happen
> without loading everything in memory?
>
> Thanks.
>
>
> On Thu, Jun 21, 2012 at 3:29 PM, Sean Owen <[email protected]> wrote:
>
>> I would suggest pruning similarities near 0, and then treating missing
>> similarities as 0 later at runtime. It may take a bit of coding. But
>> you should be able to throw away a lot without compromising much of
>> the result.
>>
>> On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[email protected]> wrote:
>> > Hi, guys,
>> >
>> > For item-based recommendation, I pre-calculated the item similarities on
>> > Hadoop per algorithm, which generated 20m rows each. The problem now is I
>> > can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with
>> > 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow.
>> > What are the alternatives?
>> >
>> > For user-based recommendation, I can't load 100m lines of data model from
>> > FileDataModel into memory. It ran out of memory after 20m lines. The same
>> > issue with JDBCDataModel is way too slow. Does anyone precalculate the
>> user
>> > similarities before and recommend items to a user?
>> >
>> > Anyone had the similar issues before?
>> >
>> > Thanks,
>> >
>> > YG
>>

Re: Performance issue with Item-based Recommendation and User-based Recommendation

Reply via email to