Re: Need to reduce execution time of RowSimilarityJob

yamo93 Thu, 20 Sep 2012 01:28:56 -0700

Hello,

If document is second, the result will be recommendations of document toterms, isn't it ?My need is to find most similar documents (i.e. recommend document todocuments, no ?)


Thx for your help,

On 09/18/2012 04:03 PM, Sebastian Schelter wrote:

Another error of mine, sorry, documents need to be second :)

On 09/18/2012 03:58 PM, yamo93 wrote:

If document is in first place, should i use user based recommenderinstead of item based ?


On 09/18/2012 03:21 PM, Sebastian Schelter wrote:

Oh I overlooked that, sorry. You could give it (document,term,tfidf)
pairs instead. If you find it awkward to use a recommender to compute
document similarities, than maybe it would be better to think a about a
custom in-memory implementation.


On 18.09.2012 15:13, yamo93 wrote:

Thanks,

I need some explanations :
GenericItemBasedRecommender needs a FileDataModel with userId, itemId,
score.
But i have some text documents and today i use seq2sparse and after
rowid + rowsimilarity.
How to call GenericItemBasedRecommender with sparse vectors ?

Y.

On 09/18/2012 02:57 PM, Sebastian Schelter wrote:

You don't need to develop an in-memory implementation, we already have
that.

Simply use a GenericItemBasedRecommender and ask it for the mostsimilar

items of each item.


On 18.09.2012 14:49, yamo93 wrote:

Hi,

I have 30.000 items and the computation takes more than 2h on a
pseudo-cluster, which is too long in my case.

I think of some ways to reduce the execution time of RowSimilarityJob

and I wonder if some of you have implemented them and how, orexplored

other ways.
1. tune the JVM
2. developing an in memory implementation (i.e. without hadoop)

3. reduce the size of the matrix (by removing those which have nowords

in common, for example)
4. run on real hadoop cluster with several nodes (does anyone have an
idea of the number of nodes to make it interesting)

Thanks for your help,
Yann.

Re: Need to reduce execution time of RowSimilarityJob

Reply via email to