What is your usecase exactly that you have millions of items but only 4GB RAM on the server? Curious :)
On 22.06.2012 00:26, Way Cool wrote: > Thanks guys for your quick response. > > We have a couple millions of items and 40 millions users (including > anonymous users). Up to 50 items were generated per item. > > I will try minimum similarity. Is there any document or a parameter defined > in itemsimilarity job? > > What about user-based recommendation? Any ideas how we can make that happen > without loading everything in memory? > > Thanks. > > > On Thu, Jun 21, 2012 at 3:29 PM, Sean Owen <[email protected]> wrote: > >> I would suggest pruning similarities near 0, and then treating missing >> similarities as 0 later at runtime. It may take a bit of coding. But >> you should be able to throw away a lot without compromising much of >> the result. >> >> On Thu, Jun 21, 2012 at 10:16 PM, Way Cool <[email protected]> wrote: >>> Hi, guys, >>> >>> For item-based recommendation, I pre-calculated the item similarities on >>> Hadoop per algorithm, which generated 20m rows each. The problem now is I >>> can't just load them into memory via MySQLJDBCInMemoryItemSimilarity with >>> 4GB memory. I tried MySQLJDBCItemSimilarity, however it's way too slow. >>> What are the alternatives? >>> >>> For user-based recommendation, I can't load 100m lines of data model from >>> FileDataModel into memory. It ran out of memory after 20m lines. The same >>> issue with JDBCDataModel is way too slow. Does anyone precalculate the >> user >>> similarities before and recommend items to a user? >>> >>> Anyone had the similar issues before? >>> >>> Thanks, >>> >>> YG >> >
