I don't know if you can speed this up very directly. You can try a
different similarity metric. But if you really want to compute every
item-item pair, it's necessarily going to scale as the square of the
number of items, and that will be slow. Consider whether you need to
precompute every pair.

On Fri, Sep 28, 2012 at 5:07 PM, Abhishek Roy <[email protected]> wrote:
>> >Thanks for your inputs Sean. I implemented the top N(most similar items)
> looking at and reusing the most SimilatItems available. Works fine. Now, scale
> in action ! testing with a set of 200,000 items, computing the most similar
> items for 1 item takes around 20 secs.
> My approach is to pre-compute most similar for all the 200,000 items.
> I am not looking at Hadoop for now (2000 item base currently). I know I can
> reduce my data size for similarity computation.
> What are my options ?

Reply via email to