You don't need to develop an in-memory implementation, we already have that.
Simply use a GenericItemBasedRecommender and ask it for the most similar items of each item. On 18.09.2012 14:49, yamo93 wrote: > Hi, > > I have 30.000 items and the computation takes more than 2h on a > pseudo-cluster, which is too long in my case. > > I think of some ways to reduce the execution time of RowSimilarityJob > and I wonder if some of you have implemented them and how, or explored > other ways. > 1. tune the JVM > 2. developing an in memory implementation (i.e. without hadoop) > 3. reduce the size of the matrix (by removing those which have no words > in common, for example) > 4. run on real hadoop cluster with several nodes (does anyone have an > idea of the number of nodes to make it interesting) > > Thanks for your help, > Yann.
