Why not just use mahout to do this, there is an item similarity algorithm in mahout that does exactly this :)
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html You can use mahout in distributed and non-distributed mode as well. > From: lcguerreroc...@gmail.com > Date: Fri, 28 Jun 2013 12:16:57 -0500 > Subject: Content based recommender using lucene/solr > To: solr-user@lucene.apache.org; java-u...@lucene.apache.org > > Hi, > > I'm using lucene and solr right now in a production environment with an > index of about a million docs. I'm working on a recommender that basically > would list the n most similar items to the user based on the current item > he is viewing. > > I've been thinking of using solr/lucene since I already have all docs > available and I want a quick version that can be deployed while we work on > a more robust recommender. How about overriding the default similarity so > that it scores documents based on the euclidean distance of normalized item > attributes and then using a morelikethis component to pass in the > attributes of the item for which I want to generate recommendations? I know > it has its issues like recomputing scores/normalization/weight application > at query time which could make this idea unfeasible/impractical. I'm at a > very preliminary stage right now with this and would love some suggestions > from experienced users. > > thank you, > > Luis Guerrero