Re: Need to reduce execution time of RowSimilarityJob

Sebastian Schelter Tue, 18 Sep 2012 05:58:20 -0700

You don't need to develop an in-memory implementation, we already have that.


Simply use a GenericItemBasedRecommender and ask it for the most similar
items of each item.


On 18.09.2012 14:49, yamo93 wrote:
> Hi,
> 
> I have 30.000 items and the computation takes more than 2h on a
> pseudo-cluster, which is too long in my case.
> 
> I think of some ways to reduce the execution time of RowSimilarityJob
> and I wonder if some of you have implemented them and how, or explored
> other ways.
> 1. tune the JVM
> 2. developing an in memory implementation (i.e. without hadoop)
> 3. reduce the size of the matrix (by removing those which have no words
> in common, for example)
> 4. run on real hadoop cluster with several nodes (does anyone have an
> idea of the number of nodes to make it interesting)
> 
> Thanks for your help,
> Yann.

Re: Need to reduce execution time of RowSimilarityJob

Reply via email to