Could anyone give me some advices or recommendations or usual ways to do this ?
I am trying to get all (probably top 100) product recommendations for each user from a model (MatrixFactorizationModel), but I haven't figured out yet to do it efficiently. So far, calling predict (predictAll in pyspark) method with user-product matrix uses too much memory and couldn't complete due to a lack of memory, and calling predict for each user (or for each some users like 100 uses or so) takes too much time to get all the recommendations. I am using spark 1.4.1 and running 5-node cluster with 8GB RAM each. I only use small-sized data set so far, like about 50000 users and 5000 products with only about 100000 ratings. Thanks. On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada <mogwa...@gmail.com> wrote: > Hi, > > I'm testing Collaborative Filtering with Milib. > Making a model by ALS.trainImplicit (or train) seems scalable as far as I > have tested, > but I'm wondering how I can get all the recommendation results efficiently. > > The predictAll method can get all the results, > but it needs the whole user-product matrix in memory as an input. > So if there are 1 million users and 1 million products, then the number of > elements is too large (1 million x 1 million) > and the amount of memory to hold them is more than a few TB even when the > element size in only 4B, > which is not a realistic size of memory even now. > > # (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB) > > We can, of course, use predict method per user, > but, as far as I tried, it is very slow to get 1 million users' results. > > Do I miss something ? > Are there any other better ways to get all the recommendation results in > scalable and efficient way ? > > Best regards, > Hiro > > >