Oh I overlooked that, sorry. You could give it (document,term,tfidf) pairs instead. If you find it awkward to use a recommender to compute document similarities, than maybe it would be better to think a about a custom in-memory implementation.
On 18.09.2012 15:13, yamo93 wrote: > Thanks, > > I need some explanations : > GenericItemBasedRecommender needs a FileDataModel with userId, itemId, > score. > But i have some text documents and today i use seq2sparse and after > rowid + rowsimilarity. > How to call GenericItemBasedRecommender with sparse vectors ? > > Y. > > On 09/18/2012 02:57 PM, Sebastian Schelter wrote: >> You don't need to develop an in-memory implementation, we already have >> that. >> >> Simply use a GenericItemBasedRecommender and ask it for the most similar >> items of each item. >> >> >> On 18.09.2012 14:49, yamo93 wrote: >>> Hi, >>> >>> I have 30.000 items and the computation takes more than 2h on a >>> pseudo-cluster, which is too long in my case. >>> >>> I think of some ways to reduce the execution time of RowSimilarityJob >>> and I wonder if some of you have implemented them and how, or explored >>> other ways. >>> 1. tune the JVM >>> 2. developing an in memory implementation (i.e. without hadoop) >>> 3. reduce the size of the matrix (by removing those which have no words >>> in common, for example) >>> 4. run on real hadoop cluster with several nodes (does anyone have an >>> idea of the number of nodes to make it interesting) >>> >>> Thanks for your help, >>> Yann. >
