Re: Need to reduce execution time of RowSimilarityJob

Sebastian Schelter Tue, 18 Sep 2012 06:22:17 -0700

Oh I overlooked that, sorry. You could give it (document,term,tfidf)
pairs instead. If you find it awkward to use a recommender to compute
document similarities, than maybe it would be better to think a about a
custom in-memory implementation.



On 18.09.2012 15:13, yamo93 wrote:
> Thanks,
> 
> I need some explanations :
> GenericItemBasedRecommender needs a FileDataModel with userId, itemId,
> score.
> But i have some text documents and today i use seq2sparse and after
> rowid + rowsimilarity.
> How to call GenericItemBasedRecommender with sparse vectors ?
> 
> Y.
> 
> On 09/18/2012 02:57 PM, Sebastian Schelter wrote:
>> You don't need to develop an in-memory implementation, we already have
>> that.
>>
>> Simply use a GenericItemBasedRecommender and ask it for the most similar
>> items of each item.
>>
>>
>> On 18.09.2012 14:49, yamo93 wrote:
>>> Hi,
>>>
>>> I have 30.000 items and the computation takes more than 2h on a
>>> pseudo-cluster, which is too long in my case.
>>>
>>> I think of some ways to reduce the execution time of RowSimilarityJob
>>> and I wonder if some of you have implemented them and how, or explored
>>> other ways.
>>> 1. tune the JVM
>>> 2. developing an in memory implementation (i.e. without hadoop)
>>> 3. reduce the size of the matrix (by removing those which have no words
>>> in common, for example)
>>> 4. run on real hadoop cluster with several nodes (does anyone have an
>>> idea of the number of nodes to make it interesting)
>>>
>>> Thanks for your help,
>>> Yann.
>

Re: Need to reduce execution time of RowSimilarityJob

Reply via email to